cloudquery / cloudquery Goto Github PK

The open source high performance ELT framework powered by Apache Arrow

License: Mozilla Public License 2.0

Go 71.12% Makefile 0.95% Shell 0.26% TypeScript 11.93% CSS 0.14% JavaScript 0.69% Smarty 0.40% Dockerfile 0.22% MDX 11.91% Python 1.73% Java 0.65%

aws gcp azure sql data-integration elt etl etl-framework bigquery data-collection

cloudquery's Issues

Feature Request: Add some kind of progress when downloading providers.

When running cloudquery init sometimes it is not clear what is happening or why it takes so long (depending on the internet connection). I suggest adding periodic print of status (how much downloaded and how much left).

AWS RDS: Subnet groups

Good day, team at cloudquery.

Currently cloudquery supports fetching a list of RDS clusters. Each cluster comes with a subnet group field.

We want to know what possible subnets the RDS instances could be deployed in, and that would require cloudquery to fetch subnet group details by further calling DescribeDBSubnetGroups API.

We should be able to see a list of AWS EC2 subnet IDs that are tied to each RDS cluster subnet group.

Command line flags not working to set driver and dsn

I'm trying to use command line flags to specify the driver and dsn, but it doesn't look like it's getting picked up. Or maybe I'm doing it wrong. When using environment variables it does work!

This doesn't work, data is still being put in the sqlite database:
cloudquery fetch --driver postgresql --dsn postgresql://localhost:5432/cloudquery

while this does work:
CQ_DRIVER=postgresql CQ_DSN=postgresql://localhost:5432/cloudquery cloudquery fetch

Error fetching azure resource groups & subscription

Using Postgres

./cloudquery fetch --driver postgresql --dsn "host=localhost user=postgres password=<redacted> DB.name=postgres port=5432" --path azure_config.yml 
2021-02-01T09:54:06.887-0500	INFO	Creating tables if needed	{"provider": "azure"}
2021/02/01 09:54:08 /go/src/github.com/troian/golang-cross-example/database/database.go:183 ERROR: ON CONFLICT DO UPDATE requires inference specification or constraint name (SQLSTATE 42601)
[1.785ms] [rows:0] INSERT INTO "azure_resources_group_tags" ("group_id","key1","value1") VALUES (6,'key2','value2'),(6,'key3','value3'),(6,'key4','value4'),(6,'key5','value5'),(6,'key6','value6'),(6,'key7','value7') ON CONFLICT DO UPDATE SET "group_id"="excluded"."group_id"
2021/02/01 09:54:08 /go/src/github.com/troian/golang-cross-example/database/database.go:183 ERROR: ON CONFLICT DO UPDATE requires inference specification or constraint name (SQLSTATE 42601)
[9.274ms] [rows:6] INSERT INTO "azure_resources_groups" ("subscription_id","resource_id","name","type","properties_provisioning_state","location","managed_by") VALUES ('<redacted subscription id>','/subscriptions/<redacted subscription id>/resourceGroups/NetworkWatcherRG','NetworkWatcherRG','Microsoft.Resources/resourceGroups','Succeeded','westus',NULL),('<redacted subscription id>,'/subscriptions/<redacted subscription id>/resourceGroups/group-name,'group-name','Microsoft.Resources/resourceGroups','Succeeded','eastus',NULL),('<redacted subscription id>','/subscriptions/<redacted subscription id>/resourceGroups/group-name','group-name','Microsoft.Resources/resourceGroups','Succeeded','eastus',NULL),('<redacted subscription id>,'/subscriptions/<redacted subscription id>/resourceGroups/group-name','group-name','Microsoft.Resources/resourceGroups','Succeeded','westus',NULL),('<redacted subscription id>','/subscriptions/<redacted subscription id>/resourceGroups/cloud-shell-storage-eastus','cloud-shell-storage-eastus','Microsoft.Resources/resourceGroups','Succeeded','eastus',NULL),('<redacted subscription id>','/subscriptions/<redacted subscription id>/resourceGroups/group-name','group-name','Microsoft.Resources/resourceGroups','Succeeded','westus',NULL) RETURNING "id"
2021-02-01T09:54:08.723-0500	INFO	Fetched resources	{"provider": "azure", "subscription_id": "<redacted subscription id>", "resource": "resources.groups", "count": 6}

Config:

providers:
  - name: azure
    subscriptions:
      - "<redacted subscription id>"
    resources:
      - name: resources.groups

Bug: cloudquery gen config aws in Ubuntu fails

When running the initial cloudquery gen config aws command after downloading in Ubuntu (WSL2), it fails. It looks like one of the default settings is not working. Details:

$ cat /etc/os-release
NAME="Ubuntu"
VERSION="20.04.2 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.2 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal

$ ./cloudquery gen config aws
11:11PM INF logging configured consoleLog=true fileLogging=true fileName=cloudquery.log jsonLogOutput=false logDirectory=. maxAgeInDays=3 maxBackups=3 maxSizeMB=30 verbose=false
11:11PM ERR failed to get providers configuration error="failed reading reattach config from CQ_REATTACH_PROVIDERS=/home/mike: read /home/mike: is a directory" provider=aws
Error: failed reading reattach config from CQ_REATTACH_PROVIDERS=/home/mike: read /home/mike: is a directory
Usage:
  cloudquery gen config [choose one or more providers (aws,gcp,azure,okta,...)] [flags]

Flags:
      --append        append new providers to existing config file
      --force         override output
  -h, --help          help for config
      --path string   path to output generated config file (default "./config.yml")

Global Flags:
      --enableConsoleLog            Enable console logging (default true)
      --enableFileLogging           enableFileLogging makes the framework logging to a file (default true)
      --encodeLogsAsJson            EncodeLogsAsJson makes the logging framework logging JSON
      --logDirectory string         Directory to logging to to when file logging is enabled (default ".")
      --logFile string              Filename is the name of the logfile which will be placed inside the directory (default "cloudquery.log")
      --maxAge int                  MaxAge the max age in days to keep a logfile (default 3)
      --maxBackups int              MaxBackups the max number of rolled files to keep (default 3)
      --maxSize int                 MaxSize the max size in MB of the logfile before it's rolled (default 30)
      --plugin-dir string           Directory to save and load CloudQuery plugins from (env: CQ_PLUGIN_DIR) (default "/home/mike")
      --reattach-providers string   Path to reattach unmanaged plugins, mostly used for testing purposes (env: CQ_REATTACH_PROVIDERS) (default "/home/mike")
  -v, --verbose                     Enable Verbose logging

2021/04/15 23:11:30 failed reading reattach config from CQ_REATTACH_PROVIDERS=/home/mike: read /home/mike: is a directory

panic: runtime error: on ./cloudquery fetch (aws / eu-central-1)

uname -a
Darwin YYY 19.6.0 Darwin Kernel Version 19.6.0: Thu Oct 29 22:56:45 PDT 2020; root:xnu-6153.141.2.2~1/RELEASE_X86_64 x86_64

./cloudquery version
Version: 0.4.1
Commit: 057f16e8c81ce6ce92898bf92f9ecdcc099bd96b
Date: 2020-12-15 20:25:48.052503 +0100 CET m=+0.161209429

./cloudquery fetch
<...>
2020-12-15T20:23:48.963+0100	INFO	Fetched resources	{"provider": "aws", "account_id": "XXX", "region": "eu-central-1", "resource": "ec2.instances", "count": 1}
2020-12-15T20:23:48.965+0100	INFO	Fetched resources	{"provider": "aws", "account_id": "XXX", "region": "eu-central-1", "resource": "ec2.instances", "count": 1}
2020-12-15T20:23:48.975+0100	INFO	Fetched resources	{"provider": "aws", "account_id": "XXX", "region": "eu-central-1", "resource": "redshift.clusters", "count": 0}
2020-12-15T20:23:48.983+0100	INFO	Fetched resources	{"provider": "aws", "account_id": "XXX", "region": "eu-central-1", "resource": "ec2.images", "count": 2}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x4a590e6]

goroutine 59 [running]:
github.com/cloudquery/cloudquery/providers/aws/elasticbeanstalk.(*Client).transformEnvironmentResources(0xc000922870, 0x0, 0x0)
	/go/src/github.com/troian/golang-cross-example/providers/aws/elasticbeanstalk/enironments.go:127 +0x26
github.com/cloudquery/cloudquery/providers/aws/elasticbeanstalk.(*Client).transformEnvironment(0xc000922870, 0xc001608000, 0x45fc1ba)
	/go/src/github.com/troian/golang-cross-example/providers/aws/elasticbeanstalk/enironments.go:150 +0x8c
github.com/cloudquery/cloudquery/providers/aws/elasticbeanstalk.(*Client).transformEnvironments(0xc000922870, 0xc000635238, 0x1, 0x1, 0x0, 0x0, 0xc001604c30)
	/go/src/github.com/troian/golang-cross-example/providers/aws/elasticbeanstalk/enironments.go:162 +0x7e
github.com/cloudquery/cloudquery/providers/aws/elasticbeanstalk.(*Client).environments(0xc000922870, 0x5b56a00, 0x0, 0x0, 0x10)
	/go/src/github.com/troian/golang-cross-example/providers/aws/elasticbeanstalk/enironments.go:192 +0x2d7
github.com/cloudquery/cloudquery/providers/aws/elasticbeanstalk.(*Client).CollectResource(0xc000922870, 0xc000039291, 0xc, 0x5b56a00, 0x0, 0x1, 0xc0000a3880)
	/go/src/github.com/troian/golang-cross-example/providers/aws/elasticbeanstalk/client.go:39 +0xf9
github.com/cloudquery/cloudquery/providers/aws.(*Provider).collectResource(0xc0003c05a0, 0xc000925090, 0xc000039280, 0x1d, 0x5b56a00, 0x0)
	/go/src/github.com/troian/golang-cross-example/providers/aws/provider.go:213 +0x1ef
created by github.com/cloudquery/cloudquery/providers/aws.(*Provider).Run
	/go/src/github.com/troian/golang-cross-example/providers/aws/provider.go:147 +0x1fd

MySQL Driver - 64 Character Identifier Limit

I love the idea behind this project and think it will be incredibly useful for many organizations. I'm hoping to be a contributor here, so let me know how I can help!

When running the example.config.yml, ec2 instances attempts to create a column named 'capacity_reservation_target_capacity_reservation_resource_group_arn'.

Error: Error 1059: Identifier name 'capacity_reservation_target_capacity_reservation_resource_group_arn' is too long

MySQL has a 64 character limit for identifiers, so we should find a way to identify and compact names > 64 characters long when using the MySQL driver. For the sake of autocomplete and api similarity, I actually think simply taking the first 64 characters will be the best bet in most situations.

AWS: UnauthorizedOperation - fatal

AWS response "UnauthorizedOperation: You are not authorized to perform this operation" doesn't seem to be handled properly. Cloudquery abruptly stops when it gets such response message

2021/01/16 17:39:57 UnauthorizedOperation: You are not authorized to perform this operation.
status code: 403, request id: d001de6e-cb48-446e-a402-8fc8c67ff275

Db handle leak

It appears that the the closure of Rows is not deferred nor is the last error returned by the function. please dismiss this of incorrect otherwise I'll open up a PR later to fix this and other resource leaks likes this they can be picked up by static analysis tooling or manually.

Feature Request: Support BigQuery as database

Story

As report builder
I want to use Google's BigQuery
So that I can benefit of BigQuery's capabilities

Why use BigQuery

Setting up BigQuery is very simple (automated or through the gcp console)
BigQuery can be easily integrated with Google Data Studio which can be used to create interactive reports.
BigQuery supports an on-demand pricing model very useful for prototyping.
BigQuery on-demand model allows for one free 1TB per month

panic running "cloudquery query"

OS

Distributor ID: Ubuntu
Description: Ubuntu 20.04 LTS
Release: 20.04

version

Version: 0.9.6
Commit: 0102920
Date: 2021-02-26 17:19:16.512656639 +0000 UTC m=+0.028906351

To reproduce

Running the setup per readme, i can fetch successfully and populate a local sqlite db. Generating policy for aws_cis, when I run cloudquery query, i get a panic:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x48 pc=0xcd4e3c]

goroutine 1 [running]:
github.com/cloudquery/cloudquery/cloudqueryclient.(*Client).RunQuery(0xc000422300, 0x107c55a, 0xc, 0x0, 0x0, 0xc000422300, 0x0)
        /home/ubuntu/sauce/cloudquery/cloudqueryclient/client.go:231 +0x6c
github.com/cloudquery/cloudquery/cmd.glob..func7(0x19601a0, 0x19f9c58, 0x0, 0x0, 0x0, 0x0)
        /home/ubuntu/sauce/cloudquery/cmd/query.go:24 +0x316
github.com/spf13/cobra.(*Command).execute(0x19601a0, 0x19f9c58, 0x0, 0x0, 0x19601a0, 0x19f9c58)
        /home/ubuntu/go/pkg/mod/github.com/spf13/[email protected]/command.go:850 +0x460
github.com/spf13/cobra.(*Command).ExecuteC(0x1960440, 0x1086c18, 0x18, 0x0)
        /home/ubuntu/go/pkg/mod/github.com/spf13/[email protected]/command.go:958 +0x349
github.com/spf13/cobra.(*Command).Execute(...)
        /home/ubuntu/go/pkg/mod/github.com/spf13/[email protected]/command.go:895
github.com/cloudquery/cloudquery/cmd.Execute()
        /home/ubuntu/sauce/cloudquery/cmd/root.go:24 +0x2d
main.main()
        /home/ubuntu/sauce/cloudquery/main.go:14 +0x82

Trying with postgres i get the same error. Is there something I've missed?

Okta provider docs are copied from GCP

Description

The current Okta provider docs are exactly the same as GCP

I would have opened a PR but it looks like the docs are not part of the repo.

Cannot run a fetch with AWS/Postgresql a second time because it doesnt properly use IF EXISTS

When running with Postgres as the database, running a second time fails when attempting to update the schema:

2020/12/27 07:48:33 /go/src/github.com/troian/golang-cross-example/providers/aws/ec2/vpc_peering_connections.go:193 ERROR: column "requester_option_allow_egress_from_local_classic_link_to_remote" of relation "aws_ec2_vpc_peering_connections" already exists (SQLSTATE 42701) [157.994ms] [rows:0] ALTER TABLE "aws_ec2_vpc_peering_connections" ADD "requester_option_allow_egress_from_local_classic_link_to_remote_vpc" boolean Error: ERROR: column "requester_option_allow_egress_from_local_classic_link_to_remote" of relation "aws_ec2_vpc_peering_connections" already exists (SQLSTATE 42701)

Create policy pack for GCP

It would be useful to have a policy pack for GCP similar with AWS CIS. Some of the findings from GCP Security Command Center can be helpful for a start: https://cloud.google.com/security-command-center/docs/concepts-vulnerabilities-findings.

Mysql driver

Error: Error 1059: Identifier name 'accepter_option_allow_egress_from_local_classic_link_to_remote_vpc' is too long

Encrypted credentials

Hi,

Do you have in the roadmap to encrypt the credentials?. I just configured cloudquery to use with Azure and is a risk to put the credential in plain text on the env variables.

Thanks,
Alonso

[AWS] Implement retry/backoff when being rate limited by a resource API

If you hit the rate limits for a resource API, CloudQuery stops fetching data and errors out. It would be great if the CloudQuery AWS provider would detect this type of error and retry the request

Provide a docker image for cloudquery tool

It would be useful to have an official docker image which can be used to run the cloudquery tool. This will allow to execute the tool either using Kubernetes or cloud run service.

segfault when running fetch

➜  ~ uname -a
Darwin MacBook-Pro.local 19.5.0 Darwin Kernel Version 19.5.0: Tue May 26 20:41:44 PDT 2020; root:xnu-6153.121.2~2/RELEASE_X86_64 x86_64

➜  ~ ./cloudquery version
Version: 0.4.6
Commit: 96e6541ee8c8a86e8bb4b580bf7bb526ce3ec68a
Date: 2020-12-17 14:03:28.112132 -0500 EST m=+0.011338626

➜  ~ ./cloudquery gen config gcp

➜  ~ ./cloudquery fetch
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x530fbcd]

goroutine 1 [running]:
github.com/cloudquery/cloudquery/providers/gcp/compute.(*Client).transformInstance(0xc000431c40, 0xc000147200, 0x9)
	/go/src/github.com/troian/golang-cross-example/providers/gcp/compute/instances.go:420 +0x1ad
github.com/cloudquery/cloudquery/providers/gcp/compute.(*Client).transformInstances(0xc000431c40, 0xc0005ee520, 0x2, 0x4, 0x0, 0x0, 0x0)
	/go/src/github.com/troian/golang-cross-example/providers/gcp/compute/instances.go:446 +0x7e
github.com/cloudquery/cloudquery/providers/gcp/compute.(*Client).instances(0xc000431c40, 0x5b57720, 0x0, 0x5e5a7e0, 0x0)
	/go/src/github.com/troian/golang-cross-example/providers/gcp/compute/instances.go:493 +0x20b
github.com/cloudquery/cloudquery/providers/gcp/compute.(*Client).CollectResource(0xc000431c40, 0xc00033c8e8, 0x9, 0x5b57720, 0x0, 0xc00043c990, 0xc)
	/go/src/github.com/troian/golang-cross-example/providers/gcp/compute/client.go:42 +0x187
github.com/cloudquery/cloudquery/providers/gcp.(*Provider).collectResource(0xc00031cc80, 0xc00033c8e0, 0x11, 0x5b57720, 0x0, 0x0, 0x0)
	/go/src/github.com/troian/golang-cross-example/providers/gcp/provider.go:99 +0x289
github.com/cloudquery/cloudquery/providers/gcp.(*Provider).Run(0xc00031cc80, 0x5b57720, 0xc00038f710, 0xc00031cc80, 0x0)
	/go/src/github.com/troian/golang-cross-example/providers/gcp/provider.go:64 +0xd2
github.com/cloudquery/cloudquery/cloudqueryclient.(*Client).Run(0xc00038ed50, 0x5fbdb3e, 0xc, 0xf, 0xc00063fd00)
	/go/src/github.com/troian/golang-cross-example/cloudqueryclient/client.go:137 +0x4b9
github.com/cloudquery/cloudquery/cmd.glob..func2(0x720b4c0, 0x72518c8, 0x0, 0x0, 0x0, 0x0)
	/go/src/github.com/troian/golang-cross-example/cmd/fetch.go:22 +0xae
github.com/spf13/cobra.(*Command).execute(0x720b4c0, 0x72518c8, 0x0, 0x0, 0x720b4c0, 0x72518c8)
	/go/pkg/mod/github.com/spf13/[email protected]/command.go:850 +0x47c
github.com/spf13/cobra.(*Command).ExecuteC(0x720ba00, 0x4008965, 0xc00010e058, 0x0)
	/go/pkg/mod/github.com/spf13/[email protected]/command.go:958 +0x375
github.com/spf13/cobra.(*Command).Execute(...)
	/go/pkg/mod/github.com/spf13/[email protected]/command.go:895
github.com/cloudquery/cloudquery/cmd.Execute()
	/go/src/github.com/troian/golang-cross-example/cmd/root.go:22 +0x31
main.main()
	/go/src/github.com/troian/golang-cross-example/main.go:8 +0x25

Using the provided config.yml example, i get the above error immediately. Iteratively commenting out components, I can get it to run:

providers:
  - name: gcp
    project_id: my-gcp-project
    resources:
      # - name: compute.instances
      - name: compute.autoscalers
      - name: compute.disk_types
      - name: compute.images
      - name: compute.interconnects
      # - name: compute.ssl_certificates
      # - name: compute.vpn_gateways
      # - name: iam.project_roles
      - name: iam.service_accounts
      - name: storage.buckets

➜  ~ ./cloudquery fetch
2020-12-17T14:10:51.616-0500	INFO	Fetched resources	{"provider": "gcp", "resource": "compute.addresses", "count": 0}
2020-12-17T14:10:52.215-0500	INFO	Fetched resources	{"provider": "gcp", "resource": "compute.disk_types", "count": 412}
2020-12-17T14:10:52.359-0500	INFO	Fetched resources	{"provider": "gcp", "resource": "compute.images", "count": 0}
2020-12-17T14:10:52.518-0500	INFO	Fetched resources	{"provider": "gcp", "resource": "compute.interconnects", "count": 0}
2020-12-17T14:10:52.968-0500	INFO	Fetched resources	{"provider": "gcp", "resource": "iam.service_accounts", "count": 17}
2020-12-17T14:10:58.918-0500	INFO	Fetched resources	{"provider": "gcp", "resource": "storage.buckets", "count": 23}

AWS provider - Is not importing ECS information or ECR images

I have the generated config for AWS, and I see that ECS and ECR are configured, but after running fetch, no data is in the database for those two services.

Add AWS ECR Repository

Currently ECR images are fetched by first calling DescribeRepository in order to set the appropriate input arguments for DescribeImages. However the repository structure itself contains details such as tag immutability, image scanning config and encryption is not persisted to the database for querying.

For example, one could query which ECR repositories allow mutable images.

Feature Request: fetch all/multiple GCP Projects

We structure our GCP setup by heavily using Projects to organize things. Currently we have about 120 separate projects.

Since cloudquery's config has project_id, I would need to generate a config with 120 stanzas to allow it to fetch all of our data. Obviously I can hack that together using the output from gcloud projects list and some scripting, but it would be much nicer if there were a clean way to just tell cloudquery to fetch fetch all of them or some subset of them.

I'm happy to contribute this kind of functionality, but I wanted to first make sure that it would be the kind of contribution that you'd be willing to take and what kind of preferences you had for how it should be configured.

Eg, an obvious approach would be to add a projects_filter field to the config that would (if present) replace project_id field and tell cloudquery to fetch data for all the projects that match it. Either some kind of regex/glob matching or handling the same filter syntax that gcloud projects list --filter=.... accepts.

The other part of that functionality would also be to add a gcp_project resource since GCP projects have their own metadata (eg we use tags on projects to organize things per team/customer/etc) and I would probably want to be able to query that metadata the same as with compute/storage/etc resources.

Constraint error triggering on second fetch call using postgresql

When using postgres as a driver and running fetch for the second time I'm getting a constraint error from cloudquery.

I'm running:
CQ_DRIVER=postgresql CQ_DSN=postgresql://localhost:5432/cloudquery cloudquery fetch

The first time it runs nicely, the second time I get the following error:

Error: ERROR: constraint "fk_aws_redshift_cluster_parameter_group_statuses_cluster_parame" for relation "aws_redshift_cluster_parameter_statuses" already exists (SQLSTATE 42710)

Add AWS ECS Services Structure

For compliance purposes, it's important to pull down all of the AWS ECS Services and details. Mainly you need to get the list of services for each cluster API - (API ECS list-services). Then the service's details for each API (ECS describe-services).

The most important items for services details are:
networkConfiguration section
including SecurityGroup Id,
subnets

loadbalancers section:
target ARN (You also need a structure for target groups. This is the link between a service and the load balancer it uses.)

With the information above, you can run a query to find which service is exposed privately, or publically with what security group rules.

AWS: UnsupportedOperationException

Looks like another exception need to be handled, for continued processing

UnsupportedOperationException: arn:aws:kms:us-east-2:XXXXXX:key/xxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx origin is EXTERNAL which is not valid for this operation.

Feature request: kubernetes "cloud"

Would it make sense to treat a kubernetes cluster as a "cloud" and expose information about workloads in a similar way as with aws/gcp?

Add Version at build time for providers

Currently there is a version added at buildtime for cloudquery core only.

We need to add version for providers and add a method to the interface.

Pulling all data from many accounts is quite slow

If fetching data from multiple AWS accounts via roles, would it be possible to run each account fetch concurrently? Or at least somehow batch the operations? If you have many accounts it takes quite a long time to fetch all the data since it does every account, every region, every resource sequentially

MySQL identifier too long

👋 When running cloudquery against AWS resources using MySQL I get the below error

Error 1059: Identifier name 'fk_aws_redshift_cluster_parameter_group_statuses_cluster_parameter_status_list' is too long

MySQL has a 64 character limit on column names and it looks like that is not adjustable: https://dev.mysql.com/doc/refman/5.7/en/identifier-length.html

Maybe column names should be truncated depending on providers? Or as a configuration option?

Query feature not working

Hi,

Thanks for your effort creating this amazing tool, but i think i found an issue that i don't know how to fix. After fetching the data from AWS i tried to perform a query with the tool but it is not working, here is the log trace:

[3.085ms] [rows:0] CREATE VIEW aws_log_metric_filter_and_alarm AS SELECT * FROM aws_cloudtrail_trails
JOIN aws_cloudtrail_trail_event_selectors on aws_cloudtrail_trails.id = aws_cloudtrail_trail_event_selectors.trail_id
JOIN aws_cloudwatchlogs_metric_filters on aws_cloudtrail_trails.cloud_watch_logs_log_group_name = aws_cloudwatchlogs_metric_filters.log_group_name
JOIN aws_cloudwatch_metric_alarm_metrics on aws_cloudwatchlogs_metric_filters.filter_name = aws_cloudwatch_metric_alarm_metrics.name
JOIN aws_cloudwatch_metric_alarms on aws_cloudwatch_metric_alarm_metrics.metric_alarm_id = aws_cloudwatch_metric_alarms.id
JOIN aws_cloudwatch_metric_alarm_actions ON aws_cloudwatch_metric_alarm_metrics.id = aws_cloudwatch_metric_alarm_actions.metric_alarm_id
JOIN aws_sns_subscriptions ON aws_cloudwatch_metric_alarm_actions.value = aws_sns_subscriptions.topic_arn
WHERE is_multi_region_trail=true AND is_logging=true
AND include_management_events=true AND read_write_type = 'All'
AND subscription_arn LIKE 'aws:arn:%'
Error: ERROR: relation "aws_cloudwatchlogs_metric_filters" does not exist (SQLSTATE 42P01)

That table aws_cloudwatchlogs_metric_filters doesnt exist even in the schemaspy page.

Thanks

Feature request: Fetch relationships between AWS IAM groups, policies, user, roles

I'm trying to using Cloudquery to automate some AWS compliance work. For this task it's relevant to know which policies belong to which groups, which users belong to which groups, etc. In the end to answer the question who can access which resources.

Cloudquery can fetch these separate things, but not the relationships between them. In SQL terminology I guess that would mean join tables like users_groups that contains a row for each user, group pair.

Throttling: Rate exceeded, status code: 400

Hi,

I'm trying to use cloudquery to extract some resources for compliance purposes. I'm fetching IAM data for 23 accounts, so including all policies, users, etc.

After a seemingly random amount of time but always within a couple of minutes, I get the following error:

2020/12/21 15:34:19 Throttling: Rate exceeded
	status code: 400, request id: [left this one out]

UPDATE:
Just did a bit more research and a workaround is to comment out the accounts in the config.yml and then run fetch in batches of ~3 accounts.

Using cloudquery version 0.4.3

AWS multi-account configuration not working

When configuring multiple roles it doesn't seem like cq is even trying to assume the roles:

providers:
  - name: aws
    log_level: debug
    accounts:
      - role_arn: arn:aws:iam::123452641799:role/Administrator
      - role_arn: arn:aws:iam::123455796699:role/Administrator
    regions:
      - eu-west-1
    version: latest
    resources:
      - name: iam.policies
      - name: iam.roles
      - name: iam.users

I'm only getting data using fetch from the current account my access keys are valid for. Manually assume a role on the CLI works fine.

-

Default AWS CIS policy does not work with Postgres

The policy generated by cloudquery gen policy aws_cis is not compatible with postgres in several ways:

boolean values in postgres queries should be false instead of 0
Date types don't need to be converted
Comparing against 30 days ago e.g. can be done like: (now() - '30 days'::interval)
The VIEW that gets created fails because "column 'id' specified more than once
The root account hardware MFA check fails due to invalid group by
The filter patterns that use the view aren't valid for postgres

Maybe the cloudquery gen command could take in the same --driver flag and based on that it could switch which policy file it outputs?

Logging standards

From what I've gathered in my first skim, it seems like the logging standards haven't been completely defined yet but their central theme is around the zap.logger and it's type rappers and tricks to provide zero alloc and efficient logging.

However I think that the real slow down within this code will be related to round trip time for specific queries, optimization of queries to third party platforms, and ultimately rate limiting and dealing with API errors and graceful way.

As I continue to make contributions to this code base would it be frowned upon if I deviated from the strict yet performance zap.logger in favor of the only slightly less performant zap sugared logger?

AWS not fetching S3 buckets

I created a config using
cloudquery gen config aws
followed by
cloudquery fetch
I also added this in config.yml

     - us-east-1
     - us-west-2
     - eu-west-1
     - eu-west-2```
The output of command is

.\cloudquery_Windows_x86_64.exe fetch
2020-12-28T15:03:15.392+0500 INFO Creating tables if needed {"provider": "aws"}
2020-12-28T15:03:15.438+0500 INFO No regions specified in config.yml. Assuming all 20 regions {"provider": "aws"}
2020-12-28T15:03:28.254+0500 INFO Fetched resources {"provider": "aws", "account_id": "333870179331", "region": "eu-west-1", "resource": "fsx.backups", "count": 0}
2020-12-28T15:03:28.303+0500 INFO Fetched resources {"provider": "aws", "account_id": "333870179331", "region": "eu-west-1", "resource": "auto_scaling.launch_configurations", "count": 0}
2020-12-28T15:03:28.336+0500 INFO Fetched resources {"provider": "aws", "account_id": "333870179331", "region": "eu-west-1", "resource": "elasticbeanstalk.environments", "count": 0}
2020-12-28T15:03:28.357+0500 INFO Fetched resources {"provider": "aws", "account_id": "333870179331", "region": "eu-west-1", "resource": "efs.filesystems", "count": 0}
2020-12-28T15:03:28.391+0500 INFO Fetched resources {"provider": "aws", "account_id": "333870179331", "region": "eu-west-1", "resource": "rds.certificates", "count": 1}
2020-12-28T15:03:28.437+0500 INFO Fetched resources {"provider": "aws", "account_id": "333870179331", "region": "eu-west-1", "resource": "rds.clusters", "count": 0}
2020-12-28T15:03:28.459+0500 INFO Fetched resources {"provider": "aws", "account_id": "333870179331", "region": "eu-west-1", "resource": "rds.subnet_groups", "count": 1}
2020-12-28T15:03:28.465+0500 INFO Fetched resources {"provider": "aws", "account_id": "333870179331", "resource": "iam.groups", "count": 2}
2020-12-28T15:03:28.469+0500 INFO Fetched resources {"provider": "aws", "account_id": "333870179331", "region": "eu-west-1", "resource": "cloudtrail.trails", "count": 0}
2020-12-28T15:03:28.470+0500 INFO Fetched resources {"provider": "aws", "account_id": "333870179331", "region": "eu-west-1", "resource": "emr.clusters", "count": 0}
2020/12/28 15:03:28 NoSuchEntity: The Password Policy with domain name 333870179331 cannot be found.
status code: 404, request id: ef255e51-7a8c-4e85-aa68-d926500dfbe1


I have S3 buckets in different regions but none are showing by using the command 
`SELECT * from aws_s3_buckets `
The table is empty

S3 provider always uses us-east-1

I am not sure wether this change was correct:
c6cddae#diff-ec36749d97d730e19702baa9729041e766dd4134df981ae67702657c0abd515b

Because in 0.4.2 I get this when fetching s3 buckets:

2020/12/17 00:04:54 AuthorizationHeaderMalformed: The authorization header is malformed; 
the region 'us-east-1' is wrong; expecting 'eu-central-1'

Feature Request: Keeping History

In the current implementation, CloudQuery wipes tables for each selected resource during execution. This prevents tracking the state of an object throughout history.

One potential solution would be to include a time stamp column on each resource that is the same for all resources tracked in a given execution of CloudQuery. You could then query by all resources from a given snapshot, or query a single resource over time.

Then it would be great to include options such as only keeping x versions of a resource, or only keeping resources less than y hours old.

If primary keys are set on resource ID, they might need to be shifted to be a joint key of resource ID and time stamp. This would mean that you couldn't have two instance of CloudQuery fetch running in the same timestamp, but that likely isn't an issue

Support for Kubernetes Engine (GKE)

It would be useful to add support for following GKE related resources:

Clusters
Ingress and Services
Volumes
Workload Identity state (enabled/disabled)
Private cluster
Pod Security Policy state (enabled/disabled)
Network Security Policy state (enabled/disabled)

Inaccessible types for gorm query

Providers define their resource types under internal packages rather than being exported making them inaccessible to use in querying with gorm.

Using cloudquery as a library it would make sense to expose these types in order to reuse gorm for querying.

AWS IAM Users created in the last 4 hours might not be picked up.

AWS IAM Users are loaded via GetCredentialReport. Credential reports can only be generated every 4 hours, so resources might not be picked up if you created a user within the last 4 hours.

https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_getting-report.html

Fetch AWS VPCs and subnets

Hi, great work on cloudquery!

I'm looking to add a new feature to fetch VPCs and subnets detail as well, is there any work in progress on this one?

I would like to contribute, it would help for my use case.

Feature: AWS VPC Peering Connections

Hi, I would like to have the relevant data to visualise relationships between different VPC in a peering connection setup. This would require cloudquery to fetch details for VPC peering connections.

The details that I would need are the ones as documented in this page.

AWS Redshift: Subnet groups

Similar to #16, but for Redshift subnet groups, by further calling DescribeClusterSubnetGroups after fetching details for Redshift clusters.

It would help for us to know the subnet IDs for each of the Redshift subnet group.

Feature Request: Add support for GCR repositories and images

Can cloudquery support GCR just like it supports ECR?

As a gcp and aws user
I want to have data about both ecr(aws) and gcr(gcp)
so that I can analyze container patterns in both was and gcp

Cloudquery currently supports ECR(aws docker registry) images, however it does not support GCR (GCP).

as ref. cloudquery currently creates the following ecr related tables.

aws_ecr_image_tags
aws_ecr_image_severity_counts
aws_ecr_repositories
aws_ecr_images

panic on fetch

Thank you for making this great project open source.

A panic occurs when fetch using 0.11.6 version

$ cloudquery gen config aws
$ cloudquery init
$ cloudquery fetch -v --dsn "host=localhost user=postgres password=pass DB.name=postgres port=5432"

11:08AM INF logging configured consoleLog=true fileLogging=true fileName=cloudquery.log jsonLogOutput=false logDirectory=. maxAgeInDays=3 maxBackups=3 maxSizeMB=30 verbose=true
11:08AM DBG reading configuration file path=./config.yml
11:08AM DBG verifying provider plugin is registered pluginName=aws version=latest
11:08AM DBG provider plugin is registered pluginName=aws version=latest
11:08AM DBG downloading checksums file path=/Users/n/code/cloudquery/.cq/providers/cloudquery/aws/latest.checksums.txt pluginName=aws url=https://github.com/cloudquery/cq-provider-aws/releases/latest/download/checksums.txt version=latest
11:08AM DBG downloading checksums signature path=/Users/n/code/cloudquery/.cq/providers/cloudquery/aws/latest.checksums.txt pluginName=aws url=https://github.com/cloudquery/cq-provider-aws/releases/latest/download/checksums.txt version=latest
11:08AM DBG verifying checksums signature path=/Users/n/code/cloudquery/.cq/providers/cloudquery/aws/latest.checksums.txt pluginName=aws url=https://github.com/cloudquery/cq-provider-aws/releases/latest/download/checksums.txt version=latest
11:08AM DBG getting or creating provider provider=aws version=latest
11:08AM DBG starting plugin args=["/Users/n/code/cloudquery/.cq/providers/cloudquery/aws/latest-darwin-amd64"] path=/Users/n/code/cloudquery/.cq/providers/cloudquery/aws/latest-darwin-amd64
11:08AM DBG plugin started path=/Users/n/code/cloudquery/.cq/providers/cloudquery/aws/latest-darwin-amd64 pid=45481
11:08AM DBG waiting for RPC address path=/Users/n/code/cloudquery/.cq/providers/cloudquery/aws/latest-darwin-amd64
11:08AM DBG using plugin version=1
11:08AM DBG plugin address address=/var/folders/ml/q9_q22lx6q79p36k08btzvk80000gn/T/plugin652467892 network=unix timestamp=2021-04-13T11:08:42.386+0300
11:08AM INF requesting provider initialize provider=aws version=latest

{"@level":"debug","@message":"creating table if not exists","@timestamp":"2021-04-13T11:08:42.403340+03:00","table":"aws_ec2_subnets"}
{"@level":"debug","@message":"migrating table columns if required","@timestamp":"2021-04-13T11:08:42.405240+03:00","table":"aws_ec2_subnets"}
{"@level":"debug","@message":"creating table relations","@timestamp":"2021-04-13T11:08:42.419408+03:00","table":"aws_ec2_subnets"}
{"@level":"debug","@message":"creating table relation","@timestamp":"2021-04-13T11:08:42.419450+03:00","table":"aws_ec2_subnet_ipv6_cidr_block_association_sets"}
{"@level":"debug","@message":"creating table if not exists","@timestamp":"2021-04-13T11:08:42.419505+03:00","table":"aws_ec2_subnet_ipv6_cidr_block_association_sets"}
{"@level":"debug","@message":"migrating table columns if required","@timestamp":"2021-04-13T11:08:42.421218+03:00","table":"aws_ec2_subnet_ipv6_cidr_block_association_sets"}
{"@level":"debug","@message":"creating table if not exists","@timestamp":"2021-04-13T11:08:42.425437+03:00","table":"aws_eks_clusters"}
{"@level":"debug","@message":"migrating table columns if required","@timestamp":"2021-04-13T11:08:42.435244+03:00","table":"aws_eks_clusters"}
{"@level":"debug","@message":"creating table relations","@timestamp":"2021-04-13T11:08:42.440578+03:00","table":"aws_eks_clusters"}
{"@level":"debug","@message":"creating table relation","@timestamp":"2021-04-13T11:08:42.440601+03:00","table":"aws_eks_cluster_encryption_configs"}
{"@level":"debug","@message":"creating table if not exists","@timestamp":"2021-04-13T11:08:42.440629+03:00","table":"aws_eks_cluster_encryption_configs"}
{"@level":"debug","@message":"migrating table columns if required","@timestamp":"2021-04-13T11:08:42.450134+03:00","table":"aws_eks_cluster_encryption_configs"}
{"@level":"debug","@message":"creating table relation","@timestamp":"2021-04-13T11:08:42.455376+03:00","table":"aws_eks_cluster_logging_cluster_loggings"}
{"@level":"debug","@message":"creating table if not exists","@timestamp":"2021-04-13T11:08:42.455453+03:00","table":"aws_eks_cluster_logging_cluster_loggings"}
{"@level":"debug","@message":"migrating table columns if required","@timestamp":"2021-04-13T11:08:42.467649+03:00","table":"aws_eks_cluster_logging_cluster_loggings"}
{"@level":"debug","@message":"creating table if not exists","@timestamp":"2021-04-13T11:08:42.471567+03:00","table":"aws_elasticbeanstalk_environments"}
{"@level":"debug","@message":"migrating table columns if required","@timestamp":"2021-04-13T11:08:42.472957+03:00","table":"aws_elasticbeanstalk_environments"}
{"@level":"debug","@message":"creating table relations","@timestamp":"2021-04-13T11:08:42.477606+03:00","table":"aws_elasticbeanstalk_environments"}
{"@level":"debug","@message":"creating table relation","@timestamp":"2021-04-13T11:08:42.477620+03:00","table":"aws_elasticbeanstalk_environment_links"}
{"@level":"debug","@message":"creating table if not exists","@timestamp":"2021-04-13T11:08:42.477635+03:00","table":"aws_elasticbeanstalk_environment_links"}
{"@level":"debug","@message":"migrating table columns if required","@timestamp":"2021-04-13T11:08:42.478729+03:00","table":"aws_elasticbeanstalk_environment_links"}
{"@level":"debug","@message":"creating table relation","@timestamp":"2021-04-13T11:08:42.480602+03:00","table":"aws_elasticbeanstalk_environment_resources_load_balancer_listeners"}
{"@level":"debug","@message":"creating table if not exists","@timestamp":"2021-04-13T11:08:42.480648+03:00","table":"aws_elasticbeanstalk_environment_resources_load_balancer_listeners"}
{"@level":"debug","@message":"migrating table columns if required","@timestamp":"2021-04-13T11:08:42.481864+03:00","table":"aws_elasticbeanstalk_environment_resources_load_balancer_listeners"}

11:03AM DBG panic: runtime error: invalid memory address or nil pointer dereference
11:03AM DBG [signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x3a3b985]
11:03AM DBG 
11:03AM DBG goroutine 23 [running]:
11:03AM DBG github.com/cloudquery/cq-provider-sdk/provider.Migrator.upgradeTable(0x47c41a0, 0xc000116200, 0x47d8d20, 0xc0001520e0, 0x47c3ee0, 0xc000074088, 0xc000393c80, 0x0, 0x0)
11:03AM DBG 	/home/runner/go/pkg/mod/github.com/cloudquery/[email protected]/provider/migrator.go:52 +0x445
11:03AM DBG github.com/cloudquery/cq-provider-sdk/provider.Migrator.CreateTable(0x47c41a0, 0xc000116200, 0x47d8d20, 0xc0001520e0, 0x47c3ee0, 0xc000074088, 0xc000393c80, 0xc000393a80, 0x0, 0x0)
11:03AM DBG 	/home/runner/go/pkg/mod/github.com/cloudquery/[email protected]/provider/migrator.go:74 +0x408
11:03AM DBG github.com/cloudquery/cq-provider-sdk/provider.Migrator.CreateTable(0x47c41a0, 0xc000116200, 0x47d8d20, 0xc0001520e0, 0x47c3ee0, 0xc000074088, 0xc000393a80, 0xc000393780, 0x0, 0x0)
11:03AM DBG 	/home/runner/go/pkg/mod/github.com/cloudquery/[email protected]/provider/migrator.go:86 +0x66b
11:03AM DBG github.com/cloudquery/cq-provider-sdk/provider.Migrator.CreateTable(0x47c41a0, 0xc000116200, 0x47d8d20, 0xc0001520e0, 0x47c3ee0, 0xc000074088, 0xc000393780, 0x0, 0x0, 0x0)
11:03AM DBG 	/home/runner/go/pkg/mod/github.com/cloudquery/[email protected]/provider/migrator.go:86 +0x66b
11:03AM DBG github.com/cloudquery/cq-provider-sdk/provider.(*Provider).Init(0xc00039a0a0, 0xc000118080, 0xa, 0xc000156000, 0x45, 0x100b401, 0x404de00, 0x4079680)
11:03AM DBG 	/home/runner/go/pkg/mod/github.com/cloudquery/[email protected]/provider/provider.go:63 +0x19c
11:03AM DBG github.com/cloudquery/cq-provider-sdk/proto.(*GRPCServer).Init(0xc0003d6960, 0x47c3f60, 0xc00010e450, 0xc00013c000, 0xc0003d6960, 0xc00010e450, 0xc0004a0ba0)
11:03AM DBG 	/home/runner/go/pkg/mod/github.com/cloudquery/[email protected]/proto/grpc.go:43 +0x68
11:03AM DBG github.com/cloudquery/cq-provider-sdk/proto/internal._Provider_Init_Handler(0x4079680, 0xc0003d6960, 0x47c3f60, 0xc00010e450, 0xc00010c480, 0x0, 0x47c3f60, 0xc00010e450, 0xc000138060, 0x55)
11:03AM DBG 	/home/runner/go/pkg/mod/github.com/cloudquery/[email protected]/proto/internal/plugin_grpc.pb.go:104 +0x214
11:03AM DBG google.golang.org/grpc.(*Server).processUnaryRPC(0xc00039ec40, 0x47d5360, 0xc000482480, 0xc0003a9200, 0xc000382930, 0x55c9820, 0x0, 0x0, 0x0)
11:03AM DBG 	/home/runner/go/pkg/mod/google.golang.org/[email protected]/server.go:1217 +0x522
11:03AM DBG google.golang.org/grpc.(*Server).handleStream(0xc00039ec40, 0x47d5360, 0xc000482480, 0xc0003a9200, 0x0)
11:03AM DBG 	/home/runner/go/pkg/mod/google.golang.org/[email protected]/server.go:1540 +0xd05
11:03AM DBG google.golang.org/grpc.(*Server).serveStreams.func1.2(0xc0004c8020, 0xc00039ec40, 0x47d5360, 0xc000482480, 0xc0003a9200)
11:03AM DBG 	/home/runner/go/pkg/mod/google.golang.org/[email protected]/server.go:878 +0xa5
11:03AM DBG created by google.golang.org/grpc.(*Server).serveStreams.func1
11:03AM DBG 	/home/runner/go/pkg/mod/google.golang.org/[email protected]/server.go:876 +0x1fd
Error: rpc error: code = Unavailable desc = transport is closing
11:03AM DBG received EOF, stopping recv loop err="rpc error: code = Unavailable desc = transport is closing"
11:03AM DBG plugin process exited error="exit status 2" path=/Users/n/code/cloudquery/.cq/providers/cloudquery/aws/latest-darwin-amd64 pid=45396

cloudquery / cloudquery Goto Github PK

cloudquery's Issues

OS

version

To reproduce

Description

Recommend Projects

Recommend Topics

Recommend Org