cloudquery / cloudquery Goto Github PK
View Code? Open in Web Editor NEWThe open source high performance ELT framework powered by Apache Arrow
Home Page: https://cloudquery.io
License: Mozilla Public License 2.0
The open source high performance ELT framework powered by Apache Arrow
Home Page: https://cloudquery.io
License: Mozilla Public License 2.0
When running cloudquery init
sometimes it is not clear what is happening or why it takes so long (depending on the internet connection). I suggest adding periodic print of status (how much downloaded and how much left).
Good day, team at cloudquery.
Currently cloudquery supports fetching a list of RDS clusters. Each cluster comes with a subnet group field.
We want to know what possible subnets the RDS instances could be deployed in, and that would require cloudquery to fetch subnet group details by further calling DescribeDBSubnetGroups
API.
We should be able to see a list of AWS EC2 subnet IDs that are tied to each RDS cluster subnet group.
I'm trying to use command line flags to specify the driver and dsn, but it doesn't look like it's getting picked up. Or maybe I'm doing it wrong. When using environment variables it does work!
This doesn't work, data is still being put in the sqlite database:
cloudquery fetch --driver postgresql --dsn postgresql://localhost:5432/cloudquery
while this does work:
CQ_DRIVER=postgresql CQ_DSN=postgresql://localhost:5432/cloudquery cloudquery fetch
Using Postgres
./cloudquery fetch --driver postgresql --dsn "host=localhost user=postgres password=<redacted> DB.name=postgres port=5432" --path azure_config.yml
2021-02-01T09:54:06.887-0500 INFO Creating tables if needed {"provider": "azure"}
2021/02/01 09:54:08 /go/src/github.com/troian/golang-cross-example/database/database.go:183 ERROR: ON CONFLICT DO UPDATE requires inference specification or constraint name (SQLSTATE 42601)
[1.785ms] [rows:0] INSERT INTO "azure_resources_group_tags" ("group_id","key1","value1") VALUES (6,'key2','value2'),(6,'key3','value3'),(6,'key4','value4'),(6,'key5','value5'),(6,'key6','value6'),(6,'key7','value7') ON CONFLICT DO UPDATE SET "group_id"="excluded"."group_id"
2021/02/01 09:54:08 /go/src/github.com/troian/golang-cross-example/database/database.go:183 ERROR: ON CONFLICT DO UPDATE requires inference specification or constraint name (SQLSTATE 42601)
[9.274ms] [rows:6] INSERT INTO "azure_resources_groups" ("subscription_id","resource_id","name","type","properties_provisioning_state","location","managed_by") VALUES ('<redacted subscription id>','/subscriptions/<redacted subscription id>/resourceGroups/NetworkWatcherRG','NetworkWatcherRG','Microsoft.Resources/resourceGroups','Succeeded','westus',NULL),('<redacted subscription id>,'/subscriptions/<redacted subscription id>/resourceGroups/group-name,'group-name','Microsoft.Resources/resourceGroups','Succeeded','eastus',NULL),('<redacted subscription id>','/subscriptions/<redacted subscription id>/resourceGroups/group-name','group-name','Microsoft.Resources/resourceGroups','Succeeded','eastus',NULL),('<redacted subscription id>,'/subscriptions/<redacted subscription id>/resourceGroups/group-name','group-name','Microsoft.Resources/resourceGroups','Succeeded','westus',NULL),('<redacted subscription id>','/subscriptions/<redacted subscription id>/resourceGroups/cloud-shell-storage-eastus','cloud-shell-storage-eastus','Microsoft.Resources/resourceGroups','Succeeded','eastus',NULL),('<redacted subscription id>','/subscriptions/<redacted subscription id>/resourceGroups/group-name','group-name','Microsoft.Resources/resourceGroups','Succeeded','westus',NULL) RETURNING "id"
2021-02-01T09:54:08.723-0500 INFO Fetched resources {"provider": "azure", "subscription_id": "<redacted subscription id>", "resource": "resources.groups", "count": 6}
Config:
providers:
- name: azure
subscriptions:
- "<redacted subscription id>"
resources:
- name: resources.groups
When running the initial cloudquery gen config aws
command after downloading in Ubuntu (WSL2), it fails. It looks like one of the default settings is not working. Details:
$ cat /etc/os-release
NAME="Ubuntu"
VERSION="20.04.2 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.2 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal
$ ./cloudquery gen config aws
11:11PM INF logging configured consoleLog=true fileLogging=true fileName=cloudquery.log jsonLogOutput=false logDirectory=. maxAgeInDays=3 maxBackups=3 maxSizeMB=30 verbose=false
11:11PM ERR failed to get providers configuration error="failed reading reattach config from CQ_REATTACH_PROVIDERS=/home/mike: read /home/mike: is a directory" provider=aws
Error: failed reading reattach config from CQ_REATTACH_PROVIDERS=/home/mike: read /home/mike: is a directory
Usage:
cloudquery gen config [choose one or more providers (aws,gcp,azure,okta,...)] [flags]
Flags:
--append append new providers to existing config file
--force override output
-h, --help help for config
--path string path to output generated config file (default "./config.yml")
Global Flags:
--enableConsoleLog Enable console logging (default true)
--enableFileLogging enableFileLogging makes the framework logging to a file (default true)
--encodeLogsAsJson EncodeLogsAsJson makes the logging framework logging JSON
--logDirectory string Directory to logging to to when file logging is enabled (default ".")
--logFile string Filename is the name of the logfile which will be placed inside the directory (default "cloudquery.log")
--maxAge int MaxAge the max age in days to keep a logfile (default 3)
--maxBackups int MaxBackups the max number of rolled files to keep (default 3)
--maxSize int MaxSize the max size in MB of the logfile before it's rolled (default 30)
--plugin-dir string Directory to save and load CloudQuery plugins from (env: CQ_PLUGIN_DIR) (default "/home/mike")
--reattach-providers string Path to reattach unmanaged plugins, mostly used for testing purposes (env: CQ_REATTACH_PROVIDERS) (default "/home/mike")
-v, --verbose Enable Verbose logging
2021/04/15 23:11:30 failed reading reattach config from CQ_REATTACH_PROVIDERS=/home/mike: read /home/mike: is a directory
uname -a
Darwin YYY 19.6.0 Darwin Kernel Version 19.6.0: Thu Oct 29 22:56:45 PDT 2020; root:xnu-6153.141.2.2~1/RELEASE_X86_64 x86_64
./cloudquery version
Version: 0.4.1
Commit: 057f16e8c81ce6ce92898bf92f9ecdcc099bd96b
Date: 2020-12-15 20:25:48.052503 +0100 CET m=+0.161209429
./cloudquery fetch
<...>
2020-12-15T20:23:48.963+0100 INFO Fetched resources {"provider": "aws", "account_id": "XXX", "region": "eu-central-1", "resource": "ec2.instances", "count": 1}
2020-12-15T20:23:48.965+0100 INFO Fetched resources {"provider": "aws", "account_id": "XXX", "region": "eu-central-1", "resource": "ec2.instances", "count": 1}
2020-12-15T20:23:48.975+0100 INFO Fetched resources {"provider": "aws", "account_id": "XXX", "region": "eu-central-1", "resource": "redshift.clusters", "count": 0}
2020-12-15T20:23:48.983+0100 INFO Fetched resources {"provider": "aws", "account_id": "XXX", "region": "eu-central-1", "resource": "ec2.images", "count": 2}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x4a590e6]
goroutine 59 [running]:
github.com/cloudquery/cloudquery/providers/aws/elasticbeanstalk.(*Client).transformEnvironmentResources(0xc000922870, 0x0, 0x0)
/go/src/github.com/troian/golang-cross-example/providers/aws/elasticbeanstalk/enironments.go:127 +0x26
github.com/cloudquery/cloudquery/providers/aws/elasticbeanstalk.(*Client).transformEnvironment(0xc000922870, 0xc001608000, 0x45fc1ba)
/go/src/github.com/troian/golang-cross-example/providers/aws/elasticbeanstalk/enironments.go:150 +0x8c
github.com/cloudquery/cloudquery/providers/aws/elasticbeanstalk.(*Client).transformEnvironments(0xc000922870, 0xc000635238, 0x1, 0x1, 0x0, 0x0, 0xc001604c30)
/go/src/github.com/troian/golang-cross-example/providers/aws/elasticbeanstalk/enironments.go:162 +0x7e
github.com/cloudquery/cloudquery/providers/aws/elasticbeanstalk.(*Client).environments(0xc000922870, 0x5b56a00, 0x0, 0x0, 0x10)
/go/src/github.com/troian/golang-cross-example/providers/aws/elasticbeanstalk/enironments.go:192 +0x2d7
github.com/cloudquery/cloudquery/providers/aws/elasticbeanstalk.(*Client).CollectResource(0xc000922870, 0xc000039291, 0xc, 0x5b56a00, 0x0, 0x1, 0xc0000a3880)
/go/src/github.com/troian/golang-cross-example/providers/aws/elasticbeanstalk/client.go:39 +0xf9
github.com/cloudquery/cloudquery/providers/aws.(*Provider).collectResource(0xc0003c05a0, 0xc000925090, 0xc000039280, 0x1d, 0x5b56a00, 0x0)
/go/src/github.com/troian/golang-cross-example/providers/aws/provider.go:213 +0x1ef
created by github.com/cloudquery/cloudquery/providers/aws.(*Provider).Run
/go/src/github.com/troian/golang-cross-example/providers/aws/provider.go:147 +0x1fd
I love the idea behind this project and think it will be incredibly useful for many organizations. I'm hoping to be a contributor here, so let me know how I can help!
When running the example.config.yml, ec2 instances attempts to create a column named 'capacity_reservation_target_capacity_reservation_resource_group_arn'.
Error: Error 1059: Identifier name 'capacity_reservation_target_capacity_reservation_resource_group_arn' is too long
MySQL has a 64 character limit for identifiers, so we should find a way to identify and compact names > 64 characters long when using the MySQL driver. For the sake of autocomplete and api similarity, I actually think simply taking the first 64 characters will be the best bet in most situations.
AWS response "UnauthorizedOperation: You are not authorized to perform this operation" doesn't seem to be handled properly. Cloudquery abruptly stops when it gets such response message
2021/01/16 17:39:57 UnauthorizedOperation: You are not authorized to perform this operation.
status code: 403, request id: d001de6e-cb48-446e-a402-8fc8c67ff275
It appears that the the closure of Rows is not deferred nor is the last error returned by the function. please dismiss this of incorrect otherwise I'll open up a PR later to fix this and other resource leaks likes this they can be picked up by static analysis tooling or manually.
Story
As report builder
I want to use Google's BigQuery
So that I can benefit ofBigQuery
's capabilities
Why use BigQuery
Distributor ID: Ubuntu
Description: Ubuntu 20.04 LTS
Release: 20.04
Version: 0.9.6
Commit: 0102920
Date: 2021-02-26 17:19:16.512656639 +0000 UTC m=+0.028906351
Running the setup per readme, i can fetch successfully and populate a local sqlite db. Generating policy for aws_cis, when I run cloudquery query, i get a panic:
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x48 pc=0xcd4e3c]
goroutine 1 [running]:
github.com/cloudquery/cloudquery/cloudqueryclient.(*Client).RunQuery(0xc000422300, 0x107c55a, 0xc, 0x0, 0x0, 0xc000422300, 0x0)
/home/ubuntu/sauce/cloudquery/cloudqueryclient/client.go:231 +0x6c
github.com/cloudquery/cloudquery/cmd.glob..func7(0x19601a0, 0x19f9c58, 0x0, 0x0, 0x0, 0x0)
/home/ubuntu/sauce/cloudquery/cmd/query.go:24 +0x316
github.com/spf13/cobra.(*Command).execute(0x19601a0, 0x19f9c58, 0x0, 0x0, 0x19601a0, 0x19f9c58)
/home/ubuntu/go/pkg/mod/github.com/spf13/[email protected]/command.go:850 +0x460
github.com/spf13/cobra.(*Command).ExecuteC(0x1960440, 0x1086c18, 0x18, 0x0)
/home/ubuntu/go/pkg/mod/github.com/spf13/[email protected]/command.go:958 +0x349
github.com/spf13/cobra.(*Command).Execute(...)
/home/ubuntu/go/pkg/mod/github.com/spf13/[email protected]/command.go:895
github.com/cloudquery/cloudquery/cmd.Execute()
/home/ubuntu/sauce/cloudquery/cmd/root.go:24 +0x2d
main.main()
/home/ubuntu/sauce/cloudquery/main.go:14 +0x82
Trying with postgres i get the same error. Is there something I've missed?
When running with Postgres as the database, running a second time fails when attempting to update the schema:
2020/12/27 07:48:33 /go/src/github.com/troian/golang-cross-example/providers/aws/ec2/vpc_peering_connections.go:193 ERROR: column "requester_option_allow_egress_from_local_classic_link_to_remote" of relation "aws_ec2_vpc_peering_connections" already exists (SQLSTATE 42701) [157.994ms] [rows:0] ALTER TABLE "aws_ec2_vpc_peering_connections" ADD "requester_option_allow_egress_from_local_classic_link_to_remote_vpc" boolean Error: ERROR: column "requester_option_allow_egress_from_local_classic_link_to_remote" of relation "aws_ec2_vpc_peering_connections" already exists (SQLSTATE 42701)
It would be useful to have a policy pack for GCP similar with AWS CIS. Some of the findings from GCP Security Command Center can be helpful for a start: https://cloud.google.com/security-command-center/docs/concepts-vulnerabilities-findings.
Error: Error 1059: Identifier name 'accepter_option_allow_egress_from_local_classic_link_to_remote_vpc' is too long
Hi,
Do you have in the roadmap to encrypt the credentials?. I just configured cloudquery to use with Azure and is a risk to put the credential in plain text on the env variables.
Thanks,
Alonso
If you hit the rate limits for a resource API, CloudQuery stops fetching data and errors out. It would be great if the CloudQuery AWS provider would detect this type of error and retry the request
It would be useful to have an official docker image which can be used to run the cloudquery tool. This will allow to execute the tool either using Kubernetes or cloud run service.
➜ ~ uname -a
Darwin MacBook-Pro.local 19.5.0 Darwin Kernel Version 19.5.0: Tue May 26 20:41:44 PDT 2020; root:xnu-6153.121.2~2/RELEASE_X86_64 x86_64
➜ ~ ./cloudquery version
Version: 0.4.6
Commit: 96e6541ee8c8a86e8bb4b580bf7bb526ce3ec68a
Date: 2020-12-17 14:03:28.112132 -0500 EST m=+0.011338626
➜ ~ ./cloudquery gen config gcp
➜ ~ ./cloudquery fetch
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x530fbcd]
goroutine 1 [running]:
github.com/cloudquery/cloudquery/providers/gcp/compute.(*Client).transformInstance(0xc000431c40, 0xc000147200, 0x9)
/go/src/github.com/troian/golang-cross-example/providers/gcp/compute/instances.go:420 +0x1ad
github.com/cloudquery/cloudquery/providers/gcp/compute.(*Client).transformInstances(0xc000431c40, 0xc0005ee520, 0x2, 0x4, 0x0, 0x0, 0x0)
/go/src/github.com/troian/golang-cross-example/providers/gcp/compute/instances.go:446 +0x7e
github.com/cloudquery/cloudquery/providers/gcp/compute.(*Client).instances(0xc000431c40, 0x5b57720, 0x0, 0x5e5a7e0, 0x0)
/go/src/github.com/troian/golang-cross-example/providers/gcp/compute/instances.go:493 +0x20b
github.com/cloudquery/cloudquery/providers/gcp/compute.(*Client).CollectResource(0xc000431c40, 0xc00033c8e8, 0x9, 0x5b57720, 0x0, 0xc00043c990, 0xc)
/go/src/github.com/troian/golang-cross-example/providers/gcp/compute/client.go:42 +0x187
github.com/cloudquery/cloudquery/providers/gcp.(*Provider).collectResource(0xc00031cc80, 0xc00033c8e0, 0x11, 0x5b57720, 0x0, 0x0, 0x0)
/go/src/github.com/troian/golang-cross-example/providers/gcp/provider.go:99 +0x289
github.com/cloudquery/cloudquery/providers/gcp.(*Provider).Run(0xc00031cc80, 0x5b57720, 0xc00038f710, 0xc00031cc80, 0x0)
/go/src/github.com/troian/golang-cross-example/providers/gcp/provider.go:64 +0xd2
github.com/cloudquery/cloudquery/cloudqueryclient.(*Client).Run(0xc00038ed50, 0x5fbdb3e, 0xc, 0xf, 0xc00063fd00)
/go/src/github.com/troian/golang-cross-example/cloudqueryclient/client.go:137 +0x4b9
github.com/cloudquery/cloudquery/cmd.glob..func2(0x720b4c0, 0x72518c8, 0x0, 0x0, 0x0, 0x0)
/go/src/github.com/troian/golang-cross-example/cmd/fetch.go:22 +0xae
github.com/spf13/cobra.(*Command).execute(0x720b4c0, 0x72518c8, 0x0, 0x0, 0x720b4c0, 0x72518c8)
/go/pkg/mod/github.com/spf13/[email protected]/command.go:850 +0x47c
github.com/spf13/cobra.(*Command).ExecuteC(0x720ba00, 0x4008965, 0xc00010e058, 0x0)
/go/pkg/mod/github.com/spf13/[email protected]/command.go:958 +0x375
github.com/spf13/cobra.(*Command).Execute(...)
/go/pkg/mod/github.com/spf13/[email protected]/command.go:895
github.com/cloudquery/cloudquery/cmd.Execute()
/go/src/github.com/troian/golang-cross-example/cmd/root.go:22 +0x31
main.main()
/go/src/github.com/troian/golang-cross-example/main.go:8 +0x25
Using the provided config.yml
example, i get the above error immediately. Iteratively commenting out components, I can get it to run:
providers:
- name: gcp
project_id: my-gcp-project
resources:
# - name: compute.instances
- name: compute.autoscalers
- name: compute.disk_types
- name: compute.images
- name: compute.interconnects
# - name: compute.ssl_certificates
# - name: compute.vpn_gateways
# - name: iam.project_roles
- name: iam.service_accounts
- name: storage.buckets
➜ ~ ./cloudquery fetch
2020-12-17T14:10:51.616-0500 INFO Fetched resources {"provider": "gcp", "resource": "compute.addresses", "count": 0}
2020-12-17T14:10:52.215-0500 INFO Fetched resources {"provider": "gcp", "resource": "compute.disk_types", "count": 412}
2020-12-17T14:10:52.359-0500 INFO Fetched resources {"provider": "gcp", "resource": "compute.images", "count": 0}
2020-12-17T14:10:52.518-0500 INFO Fetched resources {"provider": "gcp", "resource": "compute.interconnects", "count": 0}
2020-12-17T14:10:52.968-0500 INFO Fetched resources {"provider": "gcp", "resource": "iam.service_accounts", "count": 17}
2020-12-17T14:10:58.918-0500 INFO Fetched resources {"provider": "gcp", "resource": "storage.buckets", "count": 23}
I have the generated config for AWS, and I see that ECS and ECR are configured, but after running fetch, no data is in the database for those two services.
Currently ECR images are fetched by first calling DescribeRepository
in order to set the appropriate input arguments for DescribeImages
. However the repository structure itself contains details such as tag immutability, image scanning config and encryption is not persisted to the database for querying.
For example, one could query which ECR repositories allow mutable images.
We structure our GCP setup by heavily using Projects to organize things. Currently we have about 120 separate projects.
Since cloudquery's config has project_id
, I would need to generate a config with 120 stanzas to allow it to fetch all of our data. Obviously I can hack that together using the output from gcloud projects list
and some scripting, but it would be much nicer if there were a clean way to just tell cloudquery to fetch fetch all of them or some subset of them.
I'm happy to contribute this kind of functionality, but I wanted to first make sure that it would be the kind of contribution that you'd be willing to take and what kind of preferences you had for how it should be configured.
Eg, an obvious approach would be to add a projects_filter
field to the config that would (if present) replace project_id
field and tell cloudquery to fetch data for all the projects that match it. Either some kind of regex/glob matching or handling the same filter syntax that gcloud projects list --filter=....
accepts.
The other part of that functionality would also be to add a gcp_project
resource since GCP projects have their own metadata (eg we use tags on projects to organize things per team/customer/etc) and I would probably want to be able to query that metadata the same as with compute/storage/etc resources.
When using postgres as a driver and running fetch for the second time I'm getting a constraint error from cloudquery.
I'm running:
CQ_DRIVER=postgresql CQ_DSN=postgresql://localhost:5432/cloudquery cloudquery fetch
The first time it runs nicely, the second time I get the following error:
Error: ERROR: constraint "fk_aws_redshift_cluster_parameter_group_statuses_cluster_parame" for relation "aws_redshift_cluster_parameter_statuses" already exists (SQLSTATE 42710)
For compliance purposes, it's important to pull down all of the AWS ECS Services and details. Mainly you need to get the list of services for each cluster API - (API ECS list-services). Then the service's details for each API (ECS describe-services).
The most important items for services details are:
networkConfiguration section
including SecurityGroup Id,
subnets
loadbalancers section:
target ARN (You also need a structure for target groups. This is the link between a service and the load balancer it uses.)
With the information above, you can run a query to find which service is exposed privately, or publically with what security group rules.
Looks like another exception need to be handled, for continued processing
UnsupportedOperationException: arn:aws:kms:us-east-2:XXXXXX:key/xxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx origin is EXTERNAL which is not valid for this operation.
Would it make sense to treat a kubernetes cluster as a "cloud" and expose information about workloads in a similar way as with aws/gcp?
Currently there is a version added at buildtime for cloudquery core only.
We need to add version for providers and add a method to the interface.
If fetching data from multiple AWS accounts via roles, would it be possible to run each account fetch concurrently? Or at least somehow batch the operations? If you have many accounts it takes quite a long time to fetch all the data since it does every account, every region, every resource sequentially
👋 When running cloudquery against AWS resources using MySQL I get the below error
Error 1059: Identifier name 'fk_aws_redshift_cluster_parameter_group_statuses_cluster_parameter_status_list' is too long
MySQL has a 64 character limit on column names and it looks like that is not adjustable: https://dev.mysql.com/doc/refman/5.7/en/identifier-length.html
Maybe column names should be truncated depending on providers? Or as a configuration option?
Hi,
Thanks for your effort creating this amazing tool, but i think i found an issue that i don't know how to fix. After fetching the data from AWS i tried to perform a query with the tool but it is not working, here is the log trace:
[3.085ms] [rows:0] CREATE VIEW aws_log_metric_filter_and_alarm AS SELECT * FROM aws_cloudtrail_trails
JOIN aws_cloudtrail_trail_event_selectors on aws_cloudtrail_trails.id = aws_cloudtrail_trail_event_selectors.trail_id
JOIN aws_cloudwatchlogs_metric_filters on aws_cloudtrail_trails.cloud_watch_logs_log_group_name = aws_cloudwatchlogs_metric_filters.log_group_name
JOIN aws_cloudwatch_metric_alarm_metrics on aws_cloudwatchlogs_metric_filters.filter_name = aws_cloudwatch_metric_alarm_metrics.name
JOIN aws_cloudwatch_metric_alarms on aws_cloudwatch_metric_alarm_metrics.metric_alarm_id = aws_cloudwatch_metric_alarms.id
JOIN aws_cloudwatch_metric_alarm_actions ON aws_cloudwatch_metric_alarm_metrics.id = aws_cloudwatch_metric_alarm_actions.metric_alarm_id
JOIN aws_sns_subscriptions ON aws_cloudwatch_metric_alarm_actions.value = aws_sns_subscriptions.topic_arn
WHERE is_multi_region_trail=true AND is_logging=true
AND include_management_events=true AND read_write_type = 'All'
AND subscription_arn LIKE 'aws:arn:%'
Error: ERROR: relation "aws_cloudwatchlogs_metric_filters" does not exist (SQLSTATE 42P01)
That table aws_cloudwatchlogs_metric_filters doesnt exist even in the schemaspy page.
Thanks
I'm trying to using Cloudquery to automate some AWS compliance work. For this task it's relevant to know which policies belong to which groups, which users belong to which groups, etc. In the end to answer the question who can access which resources.
Cloudquery can fetch these separate things, but not the relationships between them. In SQL terminology I guess that would mean join tables like users_groups
that contains a row for each user, group pair.
Hi,
I'm trying to use cloudquery to extract some resources for compliance purposes. I'm fetching IAM data for 23 accounts, so including all policies, users, etc.
After a seemingly random amount of time but always within a couple of minutes, I get the following error:
2020/12/21 15:34:19 Throttling: Rate exceeded
status code: 400, request id: [left this one out]
UPDATE:
Just did a bit more research and a workaround is to comment out the accounts in the config.yml and then run fetch in batches of ~3 accounts.
Using cloudquery version 0.4.3
When configuring multiple roles it doesn't seem like cq is even trying to assume the roles:
providers:
- name: aws
log_level: debug
accounts:
- role_arn: arn:aws:iam::123452641799:role/Administrator
- role_arn: arn:aws:iam::123455796699:role/Administrator
regions:
- eu-west-1
version: latest
resources:
- name: iam.policies
- name: iam.roles
- name: iam.users
I'm only getting data using fetch
from the current account my access keys are valid for. Manually assume a role on the CLI works fine.
The policy generated by cloudquery gen policy aws_cis
is not compatible with postgres in several ways:
false
instead of 0(now() - '30 days'::interval)
Maybe the cloudquery gen
command could take in the same --driver
flag and based on that it could switch which policy file it outputs?
From what I've gathered in my first skim, it seems like the logging standards haven't been completely defined yet but their central theme is around the zap.logger and it's type rappers and tricks to provide zero alloc and efficient logging.
However I think that the real slow down within this code will be related to round trip time for specific queries, optimization of queries to third party platforms, and ultimately rate limiting and dealing with API errors and graceful way.
As I continue to make contributions to this code base would it be frowned upon if I deviated from the strict yet performance zap.logger in favor of the only slightly less performant zap sugared logger?
I created a config using
cloudquery gen config aws
followed by
cloudquery fetch
I also added this in config.yml
- us-east-1
- us-west-2
- eu-west-1
- eu-west-2```
The output of command is
.\cloudquery_Windows_x86_64.exe fetch
2020-12-28T15:03:15.392+0500 INFO Creating tables if needed {"provider": "aws"}
2020-12-28T15:03:15.438+0500 INFO No regions specified in config.yml. Assuming all 20 regions {"provider": "aws"}
2020-12-28T15:03:28.254+0500 INFO Fetched resources {"provider": "aws", "account_id": "333870179331", "region": "eu-west-1", "resource": "fsx.backups", "count": 0}
2020-12-28T15:03:28.303+0500 INFO Fetched resources {"provider": "aws", "account_id": "333870179331", "region": "eu-west-1", "resource": "auto_scaling.launch_configurations", "count": 0}
2020-12-28T15:03:28.336+0500 INFO Fetched resources {"provider": "aws", "account_id": "333870179331", "region": "eu-west-1", "resource": "elasticbeanstalk.environments", "count": 0}
2020-12-28T15:03:28.357+0500 INFO Fetched resources {"provider": "aws", "account_id": "333870179331", "region": "eu-west-1", "resource": "efs.filesystems", "count": 0}
2020-12-28T15:03:28.391+0500 INFO Fetched resources {"provider": "aws", "account_id": "333870179331", "region": "eu-west-1", "resource": "rds.certificates", "count": 1}
2020-12-28T15:03:28.437+0500 INFO Fetched resources {"provider": "aws", "account_id": "333870179331", "region": "eu-west-1", "resource": "rds.clusters", "count": 0}
2020-12-28T15:03:28.459+0500 INFO Fetched resources {"provider": "aws", "account_id": "333870179331", "region": "eu-west-1", "resource": "rds.subnet_groups", "count": 1}
2020-12-28T15:03:28.465+0500 INFO Fetched resources {"provider": "aws", "account_id": "333870179331", "resource": "iam.groups", "count": 2}
2020-12-28T15:03:28.469+0500 INFO Fetched resources {"provider": "aws", "account_id": "333870179331", "region": "eu-west-1", "resource": "cloudtrail.trails", "count": 0}
2020-12-28T15:03:28.470+0500 INFO Fetched resources {"provider": "aws", "account_id": "333870179331", "region": "eu-west-1", "resource": "emr.clusters", "count": 0}
2020/12/28 15:03:28 NoSuchEntity: The Password Policy with domain name 333870179331 cannot be found.
status code: 404, request id: ef255e51-7a8c-4e85-aa68-d926500dfbe1
I have S3 buckets in different regions but none are showing by using the command
`SELECT * from aws_s3_buckets `
The table is empty
I am not sure wether this change was correct:
c6cddae#diff-ec36749d97d730e19702baa9729041e766dd4134df981ae67702657c0abd515b
Because in 0.4.2 I get this when fetching s3 buckets:
2020/12/17 00:04:54 AuthorizationHeaderMalformed: The authorization header is malformed;
the region 'us-east-1' is wrong; expecting 'eu-central-1'
In the current implementation, CloudQuery wipes tables for each selected resource during execution. This prevents tracking the state of an object throughout history.
One potential solution would be to include a time stamp column on each resource that is the same for all resources tracked in a given execution of CloudQuery. You could then query by all resources from a given snapshot, or query a single resource over time.
Then it would be great to include options such as only keeping x versions of a resource, or only keeping resources less than y hours old.
If primary keys are set on resource ID, they might need to be shifted to be a joint key of resource ID and time stamp. This would mean that you couldn't have two instance of CloudQuery fetch running in the same timestamp, but that likely isn't an issue
It would be useful to add support for following GKE related resources:
Providers define their resource types under internal packages rather than being exported making them inaccessible to use in querying with gorm.
Using cloudquery as a library it would make sense to expose these types in order to reuse gorm for querying.
AWS IAM Users are loaded via GetCredentialReport. Credential reports can only be generated every 4 hours, so resources might not be picked up if you created a user within the last 4 hours.
https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_getting-report.html
Hi, great work on cloudquery!
I'm looking to add a new feature to fetch VPCs and subnets detail as well, is there any work in progress on this one?
I would like to contribute, it would help for my use case.
Hi, I would like to have the relevant data to visualise relationships between different VPC in a peering connection setup. This would require cloudquery to fetch details for VPC peering connections.
The details that I would need are the ones as documented in this page.
Similar to #16, but for Redshift subnet groups, by further calling DescribeClusterSubnetGroups
after fetching details for Redshift clusters.
It would help for us to know the subnet IDs for each of the Redshift subnet group.
Can cloudquery support GCR just like it supports ECR?
As a gcp and aws user
I want to have data about both ecr(aws) and gcr(gcp)
so that I can analyze container patterns in both was and gcp
Cloudquery currently supports ECR(aws docker registry) images, however it does not support GCR (GCP).
as ref. cloudquery currently creates the following ecr related tables.
Thank you for making this great project open source.
A panic occurs when fetch using 0.11.6 version
$ cloudquery gen config aws
$ cloudquery init
$ cloudquery fetch -v --dsn "host=localhost user=postgres password=pass DB.name=postgres port=5432"
11:08AM INF logging configured consoleLog=true fileLogging=true fileName=cloudquery.log jsonLogOutput=false logDirectory=. maxAgeInDays=3 maxBackups=3 maxSizeMB=30 verbose=true
11:08AM DBG reading configuration file path=./config.yml
11:08AM DBG verifying provider plugin is registered pluginName=aws version=latest
11:08AM DBG provider plugin is registered pluginName=aws version=latest
11:08AM DBG downloading checksums file path=/Users/n/code/cloudquery/.cq/providers/cloudquery/aws/latest.checksums.txt pluginName=aws url=https://github.com/cloudquery/cq-provider-aws/releases/latest/download/checksums.txt version=latest
11:08AM DBG downloading checksums signature path=/Users/n/code/cloudquery/.cq/providers/cloudquery/aws/latest.checksums.txt pluginName=aws url=https://github.com/cloudquery/cq-provider-aws/releases/latest/download/checksums.txt version=latest
11:08AM DBG verifying checksums signature path=/Users/n/code/cloudquery/.cq/providers/cloudquery/aws/latest.checksums.txt pluginName=aws url=https://github.com/cloudquery/cq-provider-aws/releases/latest/download/checksums.txt version=latest
11:08AM DBG getting or creating provider provider=aws version=latest
11:08AM DBG starting plugin args=["/Users/n/code/cloudquery/.cq/providers/cloudquery/aws/latest-darwin-amd64"] path=/Users/n/code/cloudquery/.cq/providers/cloudquery/aws/latest-darwin-amd64
11:08AM DBG plugin started path=/Users/n/code/cloudquery/.cq/providers/cloudquery/aws/latest-darwin-amd64 pid=45481
11:08AM DBG waiting for RPC address path=/Users/n/code/cloudquery/.cq/providers/cloudquery/aws/latest-darwin-amd64
11:08AM DBG using plugin version=1
11:08AM DBG plugin address address=/var/folders/ml/q9_q22lx6q79p36k08btzvk80000gn/T/plugin652467892 network=unix timestamp=2021-04-13T11:08:42.386+0300
11:08AM INF requesting provider initialize provider=aws version=latest
{"@level":"debug","@message":"creating table if not exists","@timestamp":"2021-04-13T11:08:42.403340+03:00","table":"aws_ec2_subnets"}
{"@level":"debug","@message":"migrating table columns if required","@timestamp":"2021-04-13T11:08:42.405240+03:00","table":"aws_ec2_subnets"}
{"@level":"debug","@message":"creating table relations","@timestamp":"2021-04-13T11:08:42.419408+03:00","table":"aws_ec2_subnets"}
{"@level":"debug","@message":"creating table relation","@timestamp":"2021-04-13T11:08:42.419450+03:00","table":"aws_ec2_subnet_ipv6_cidr_block_association_sets"}
{"@level":"debug","@message":"creating table if not exists","@timestamp":"2021-04-13T11:08:42.419505+03:00","table":"aws_ec2_subnet_ipv6_cidr_block_association_sets"}
{"@level":"debug","@message":"migrating table columns if required","@timestamp":"2021-04-13T11:08:42.421218+03:00","table":"aws_ec2_subnet_ipv6_cidr_block_association_sets"}
{"@level":"debug","@message":"creating table if not exists","@timestamp":"2021-04-13T11:08:42.425437+03:00","table":"aws_eks_clusters"}
{"@level":"debug","@message":"migrating table columns if required","@timestamp":"2021-04-13T11:08:42.435244+03:00","table":"aws_eks_clusters"}
{"@level":"debug","@message":"creating table relations","@timestamp":"2021-04-13T11:08:42.440578+03:00","table":"aws_eks_clusters"}
{"@level":"debug","@message":"creating table relation","@timestamp":"2021-04-13T11:08:42.440601+03:00","table":"aws_eks_cluster_encryption_configs"}
{"@level":"debug","@message":"creating table if not exists","@timestamp":"2021-04-13T11:08:42.440629+03:00","table":"aws_eks_cluster_encryption_configs"}
{"@level":"debug","@message":"migrating table columns if required","@timestamp":"2021-04-13T11:08:42.450134+03:00","table":"aws_eks_cluster_encryption_configs"}
{"@level":"debug","@message":"creating table relation","@timestamp":"2021-04-13T11:08:42.455376+03:00","table":"aws_eks_cluster_logging_cluster_loggings"}
{"@level":"debug","@message":"creating table if not exists","@timestamp":"2021-04-13T11:08:42.455453+03:00","table":"aws_eks_cluster_logging_cluster_loggings"}
{"@level":"debug","@message":"migrating table columns if required","@timestamp":"2021-04-13T11:08:42.467649+03:00","table":"aws_eks_cluster_logging_cluster_loggings"}
{"@level":"debug","@message":"creating table if not exists","@timestamp":"2021-04-13T11:08:42.471567+03:00","table":"aws_elasticbeanstalk_environments"}
{"@level":"debug","@message":"migrating table columns if required","@timestamp":"2021-04-13T11:08:42.472957+03:00","table":"aws_elasticbeanstalk_environments"}
{"@level":"debug","@message":"creating table relations","@timestamp":"2021-04-13T11:08:42.477606+03:00","table":"aws_elasticbeanstalk_environments"}
{"@level":"debug","@message":"creating table relation","@timestamp":"2021-04-13T11:08:42.477620+03:00","table":"aws_elasticbeanstalk_environment_links"}
{"@level":"debug","@message":"creating table if not exists","@timestamp":"2021-04-13T11:08:42.477635+03:00","table":"aws_elasticbeanstalk_environment_links"}
{"@level":"debug","@message":"migrating table columns if required","@timestamp":"2021-04-13T11:08:42.478729+03:00","table":"aws_elasticbeanstalk_environment_links"}
{"@level":"debug","@message":"creating table relation","@timestamp":"2021-04-13T11:08:42.480602+03:00","table":"aws_elasticbeanstalk_environment_resources_load_balancer_listeners"}
{"@level":"debug","@message":"creating table if not exists","@timestamp":"2021-04-13T11:08:42.480648+03:00","table":"aws_elasticbeanstalk_environment_resources_load_balancer_listeners"}
{"@level":"debug","@message":"migrating table columns if required","@timestamp":"2021-04-13T11:08:42.481864+03:00","table":"aws_elasticbeanstalk_environment_resources_load_balancer_listeners"}
11:03AM DBG panic: runtime error: invalid memory address or nil pointer dereference
11:03AM DBG [signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x3a3b985]
11:03AM DBG
11:03AM DBG goroutine 23 [running]:
11:03AM DBG github.com/cloudquery/cq-provider-sdk/provider.Migrator.upgradeTable(0x47c41a0, 0xc000116200, 0x47d8d20, 0xc0001520e0, 0x47c3ee0, 0xc000074088, 0xc000393c80, 0x0, 0x0)
11:03AM DBG /home/runner/go/pkg/mod/github.com/cloudquery/[email protected]/provider/migrator.go:52 +0x445
11:03AM DBG github.com/cloudquery/cq-provider-sdk/provider.Migrator.CreateTable(0x47c41a0, 0xc000116200, 0x47d8d20, 0xc0001520e0, 0x47c3ee0, 0xc000074088, 0xc000393c80, 0xc000393a80, 0x0, 0x0)
11:03AM DBG /home/runner/go/pkg/mod/github.com/cloudquery/[email protected]/provider/migrator.go:74 +0x408
11:03AM DBG github.com/cloudquery/cq-provider-sdk/provider.Migrator.CreateTable(0x47c41a0, 0xc000116200, 0x47d8d20, 0xc0001520e0, 0x47c3ee0, 0xc000074088, 0xc000393a80, 0xc000393780, 0x0, 0x0)
11:03AM DBG /home/runner/go/pkg/mod/github.com/cloudquery/[email protected]/provider/migrator.go:86 +0x66b
11:03AM DBG github.com/cloudquery/cq-provider-sdk/provider.Migrator.CreateTable(0x47c41a0, 0xc000116200, 0x47d8d20, 0xc0001520e0, 0x47c3ee0, 0xc000074088, 0xc000393780, 0x0, 0x0, 0x0)
11:03AM DBG /home/runner/go/pkg/mod/github.com/cloudquery/[email protected]/provider/migrator.go:86 +0x66b
11:03AM DBG github.com/cloudquery/cq-provider-sdk/provider.(*Provider).Init(0xc00039a0a0, 0xc000118080, 0xa, 0xc000156000, 0x45, 0x100b401, 0x404de00, 0x4079680)
11:03AM DBG /home/runner/go/pkg/mod/github.com/cloudquery/[email protected]/provider/provider.go:63 +0x19c
11:03AM DBG github.com/cloudquery/cq-provider-sdk/proto.(*GRPCServer).Init(0xc0003d6960, 0x47c3f60, 0xc00010e450, 0xc00013c000, 0xc0003d6960, 0xc00010e450, 0xc0004a0ba0)
11:03AM DBG /home/runner/go/pkg/mod/github.com/cloudquery/[email protected]/proto/grpc.go:43 +0x68
11:03AM DBG github.com/cloudquery/cq-provider-sdk/proto/internal._Provider_Init_Handler(0x4079680, 0xc0003d6960, 0x47c3f60, 0xc00010e450, 0xc00010c480, 0x0, 0x47c3f60, 0xc00010e450, 0xc000138060, 0x55)
11:03AM DBG /home/runner/go/pkg/mod/github.com/cloudquery/[email protected]/proto/internal/plugin_grpc.pb.go:104 +0x214
11:03AM DBG google.golang.org/grpc.(*Server).processUnaryRPC(0xc00039ec40, 0x47d5360, 0xc000482480, 0xc0003a9200, 0xc000382930, 0x55c9820, 0x0, 0x0, 0x0)
11:03AM DBG /home/runner/go/pkg/mod/google.golang.org/[email protected]/server.go:1217 +0x522
11:03AM DBG google.golang.org/grpc.(*Server).handleStream(0xc00039ec40, 0x47d5360, 0xc000482480, 0xc0003a9200, 0x0)
11:03AM DBG /home/runner/go/pkg/mod/google.golang.org/[email protected]/server.go:1540 +0xd05
11:03AM DBG google.golang.org/grpc.(*Server).serveStreams.func1.2(0xc0004c8020, 0xc00039ec40, 0x47d5360, 0xc000482480, 0xc0003a9200)
11:03AM DBG /home/runner/go/pkg/mod/google.golang.org/[email protected]/server.go:878 +0xa5
11:03AM DBG created by google.golang.org/grpc.(*Server).serveStreams.func1
11:03AM DBG /home/runner/go/pkg/mod/google.golang.org/[email protected]/server.go:876 +0x1fd
Error: rpc error: code = Unavailable desc = transport is closing
11:03AM DBG received EOF, stopping recv loop err="rpc error: code = Unavailable desc = transport is closing"
11:03AM DBG plugin process exited error="exit status 2" path=/Users/n/code/cloudquery/.cq/providers/cloudquery/aws/latest-darwin-amd64 pid=45396
even if you the command twice and you already have the latest plugin installed it still downloads the whole plugin.
For anyone looking to utilize the policy packs programmatically or in CI, it would be useful to be able to have the output from the query
command be in format more easily parsed, such as json or yaml.
Looks like gen config --force
is throwing an error that the file already exists
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.