Giter VIP home page Giter VIP logo

dremio-cloner's People

Contributors

chufe-dremio avatar deane-dremio avatar jeff-99 avatar mxmarg avatar tejkm avatar tokoko avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

dremio-cloner's Issues

Writing permissions to DCS Project

Writing permissions to DCS Project

Using Cloner to Write to Target DCS Project, seems like acl_transformation_rbac.json file is mandatory for migaring permissions even if No Access or Permissions are being Changed.

In absence of a transformation file the during the PUT operation (write) we see an error:
ERROR:2023-08-08 11:24:17,489:_process_acl: Source User de3711ce-5367-4cdf-9b37-f7f4e8d01ecd not found in the target Dremio Environment. ACL Entry cannot be processed as per ignore_missing_acl_user configuration. space:DeepakSpace

Including an ACL file as shown below fixed the issue.
{"acl-transformation": [ { "source": {"user":"[email protected]"}, "target": {"user":"[email protected]"}} ] }

If i am not transforming any Permissions, why should we need to include a transformation file ??
The workaround is cumbersome, requiring Cloner/Dremio ADMIN to consolidate a list of all Users/Roles from Source and either build a acl_transformation file to include all these roles/users or to Generate SQL to grant Privileges from sys.organization.users or sys.users on Source Dremio Cluster.

space.folder.filter.paths config is not being respected while performing reads

Below config filter path is not being applied when performing a get operation using config_read_dir.json
{"space.folder.filter.paths": [""]},

For example,
Let's say I have a space named 'my_space' and within that space I have a folder called 'trades'. Using below config, dremio-cloner should only pull the objects located within this folder.

	{"space.filter": "*"},
	{"space.filter.names": ["my_space"]},
	{"space.exclude.filter": ""},
	{"space.folder.filter": "*"},
	{"space.folder.filter.paths": ["trades"]},
	{"space.folder.exclude.filter":""},

However right now it is pulling all the objects located in the space.

Dremio cloner unable to deploy 1 VDS (vds ln_dly_calc_fact)

I am opening this on behalf of Pawan Teja at Fannie Mae ("Nyshadham, Pawan Teja x (Contractor)" [email protected]).

Description:

We had earlier deployment on OCT end and they are unable to push this vds (LN_dly_calc_fact) in

Prod & Acpt environment

Attempts: Dev team has tried various scenarios to push the code , removed comments & reduced lines of sqlScript and it was unsuccessful .

below are the new attributes to the vds which could not be deployed

Line 201 :  Loan_Final_Additional_Tier_1_Cost_Of_Capital_Basis_Point_Rate
Line : 225 :   Loan_Final_Tier_2_Cost_Of_Capital_Basis_Point_Rate

Please see attached files .:

fnm_config_write_dir (1).json
import_10_31_2023_13_53_20 (2).log
LN_DLY_CALC_FACT_VW (1).txt

Pawan originally opened this via a ticket, but max suggested a github request.

Unable to read/get a single VDS only

I tried to simply download the definition of a single VDS. However, looks like dremio-cloner downloads all the folders along with the single VDS.

Here is my folder structure
image

Here is the output that I get
image
In the results output directory, I want to see inherit.json but not the other two folders mk and ck (Both of them actually have VDS's within them but dremio-cloner downloads only the empty folders).

What would be the right config to use here in order to only get the single VDS definition? (get only inherit.json file as output)
Below is the desired output.
image

Below is the config file used (config_read_dir.json):
Along with generating unnecessary folders, it also ends up taking a lot of time when the number of folders are high. (Time is spent making API calls for each folder)

{"dremio_cloner": [
  {"command":"get"},
  {"source": [
	{"endpoint": "https://dremio.nonprod.com/"},
	{"username": "dremio-local-admin"},
	{"password": "****"},
	{"verify_ssl": "True"},
	{"is_community_edition": "False"},
	{"graph_api_support": "True"}]
  },
{"target": [
	{"directory":"results"},
	{"overwrite": "True"}]
	},
	{"options": [
	{"logging.level":"logging.DEBUG"},
	{"logging.format":"%(levelname)s:%(asctime)s:%(message)s"},
	{"logging.filename":"read_log"},
	{"logging.verbose": "False"},

	{"max_errors":"9999"},
	{"http_timeout":"10"},

	{"user.process_mode":"skip"},
	{"group.process_mode":"skip"},
	{"space.process_mode":"skip"},
	{"source.process_mode":"skip"},
	{"reflection.process_mode": "skip"},
	{"wlm.queue.process_mode": "skip"},
	{"wlm.rule.process_mode": "skip"},
	{"wiki.process_mode": "skip"},
	{"tag.process_mode": "skip"},
	{"home.process_mode": "skip"},
	{"vote.process_mode": "skip"},
	{"folder.process_mode": "skip"},
	{"vds.process_mode": "process"},
	{"pds.process_mode": "skip"},

	{"space.filter": "*"},
	{"space.filter.names": ["CICD"]},
	{"space.exclude.filter": ""},
	{"space.folder.filter":"*"},
	{"space.folder.filter.paths": []},
	{"space.folder.exclude.filter":""},

	{"source.filter":"*"},
	{"source.filter.names": []},
	{"source.filter.types": []},
	{"source.exclude.filter":""},
	{"source.folder.filter":"*"},
	{"source.folder.filter.paths": []},
	{"source.folder.exclude.filter":""},

	{"pds.filter":"*"},
	{"pds.filter.names": []},
	{"pds.exclude.filter":""},
	{"pds.list.useapi":"False"},

	{"vds.filter":"*"},
	{"vds.filter.names": ["inherit"]},
	{"vds.exclude.filter":""},
	{"vds.dependencies.process_mode":"ignore"},

	{"reflection.only_for_matching_vds":"True"}]
	}]
}

Dremio Cloner with Cloud: ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host', None, 10054, None)

I am trying to test Dremio Cloner with Cloud. I have followed the readme at https://github.com/deane-dremio/dremio-cloner/blob/master/README.md.

However, when attempting to perform a PUT to Cloud I encounter the following error:

File "C:\Python\Lib\site-packages\requests\adapters.py", line 501, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host', None, 10054, None))

Here is an excerpt from the config_write_dir.json file:

{"dremio_cloner": [
{"command":"put"},
{"target": [
{"endpoint": "http://api.eu.dremio.cloud/"},
{"username": ""},
{"password": ""},
{"verify_ssl": "True"},
{"is_community_edition": "False"},
{"is_dremio_cloud": "True"},
{"dremio_cloud_org_id": "#########################"},
{"dremio_cloud_project_id": "###############"}
]
},

RecursionError: maximum recursion depth exceeded for a put operation

Hello Folks,

I have used dremio cloner in the past successful for source migration, I try to implement again to migrating an s3 source, but this time ran into a couple of error. I am using dremio version 22.1.7 integrated with Active Directory
LogINFO:2023-05-16 14:48:25,032:Executing command 'put'. WARNING:2023-05-16 14:48:25,651:_process_acl: Source User 30489f5d-678a-4129-a7ad-6becbbc425ca not found in the target Dremio Environment. User is removed from ACL definition as per ignore_missing_acl_user configuration. space:Samson
Error from console
Traceback (most recent call last): File "/Users/s.eromonsei/dremio-cloner/src/dremio_cloner.py", line 159, in <module> main() File "/Users/s.eromonsei/dremio-cloner/src/dremio_cloner.py", line 49, in main put_dremio_environment(config) File "/Users/s.eromonsei/dremio-cloner/src/dremio_cloner.py", line 96, in put_dremio_environment writer.write_dremio_environment() File "/Users/s.eromonsei/dremio-cloner/src/DremioWriter.py", line 90, in write_dremio_environment self._write_space(space, self._config.space_process_mode, self._config.space_ignore_missing_acl_user, self._config.space_ignore_missing_acl_group) File "/Users/s.eromonsei/dremio-cloner/src/DremioWriter.py", line 126, in _write_space return self._write_entity(entity, process_mode, ignore_missing_acl_user_flag, ignore_missing_acl_group_flag) File "/Users/s.eromonsei/dremio-cloner/src/DremioWriter.py", line 312, in _write_entity updated_entity = self._dremio_env.update_catalog_entity(entity['id'], entity, self._config.dry_run, report_error) File "/Users/s.eromonsei/dremio-cloner/src/Dremio.py", line 250, in update_catalog_entity return self._api_put_json(self._catalog_url + entity_id, entity, source="update_catalog_entity", report_error = report_error) File "/Users/s.eromonsei/dremio-cloner/src/Dremio.py", line 443, in _api_put_json return self._api_put_json(url, json_data, source, report_error, False) File "/Users/s.eromonsei/dremio-cloner/src/Dremio.py", line 443, in _api_put_json return self._api_put_json(url, json_data, source, report_error, False) File "/Users/s.eromonsei/dremio-cloner/src/Dremio.py", line 443, in _api_put_json return self._api_put_json(url, json_data, source, report_error, False) [Previous line repeated 967 more times] File "/Users/s.eromonsei/dremio-cloner/src/Dremio.py", line 430, in _api_put_json response = requests.request("PUT", self._endpoint + url, json=json_data, headers=self._headers, timeout=self._api_timeout, verify=self._verify_ssl) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/requests/api.py", line 59, in request return session.request(method=method, url=url, **kwargs) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/requests/sessions.py", line 587, in request resp = self.send(prep, **send_kwargs) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/requests/sessions.py", line 701, in send r = adapter.send(request, **kwargs) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/requests/adapters.py", line 486, in send resp = conn.urlopen( File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/urllib3/connectionpool.py", line 790, in urlopen response = self._make_request( File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/urllib3/connectionpool.py", line 536, in _make_request response = conn.getresponse() File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/urllib3/connection.py", line 454, in getresponse httplib_response = super().getresponse() File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/http/client.py", line 1322, in getresponse response.begin() File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/http/client.py", line 327, in begin self.headers = self.msg = parse_headers(self.fp) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/http/client.py", line 221, in parse_headers return email.parser.Parser(_class=_class).parsestr(hstring) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/email/parser.py", line 67, in parsestr return self.parse(StringIO(text), headersonly=headersonly) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/email/parser.py", line 56, in parse feedparser.feed(data) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/email/feedparser.py", line 176, in feed self._call_parse() File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/email/feedparser.py", line 180, in _call_parse self._parse() File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/email/feedparser.py", line 295, in _parsegen if self._cur.get_content_maintype() == 'message': File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/email/message.py", line 594, in get_content_maintype ctype = self.get_content_type() File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/email/message.py", line 578, in get_content_type value = self.get('content-type', missing) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/email/message.py", line 471, in get return self.policy.header_fetch_parse(k, v) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/email/_policybase.py", line 316, in header_fetch_parse return self._sanitize_header(name, value) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/email/_policybase.py", line 287, in _sanitize_header if _has_surrogates(value): File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/email/utils.py", line 57, in _has_surrogates s.encode() RecursionError: maximum recursion depth exceeded while calling a Python object

GET reflections from Dremio CE

GET reflections from Dremio CE
Test Env is Dremio Dremio AWSE CE 24.0.0 and 24.1.4

The Dremio "GET" operation did not pick up the reflections even when {"reflection.process_mode": "process"}.
I also tried this with {"reflection.only_for_matching_vds":"True"} and {"reflection.filter_mode": "apply_vds_pds_filter"}

The only Workaround i could get working is cumbersome using a List of Reflection Ids to migrate
{"reflection.id_include_list": ["8bf8d3dd-b3c7-47f7-879d-eef43765b061"]}

Allow non-admin users to run the tool

In version 24.1.0, DX-60480 was fixed, which prevents non-admin users from using /api/v3/users/{id} (unless they've also got the create user privilege).
This causes Cloner to fail with

{
    "errorMessage": "User not allowed to get details of other user",
    "moreInfo": ""
}

The workaround is to assign the user GRANT CREATE USER ON SYSTEM TO USER <username> but some people don't want to allow CI/CD teams to create users.

The request is to change the cloner tool to use APIs that are runnable by non-admin users and users who don't have the CREATE USER privilege.

Issue handling reflections

Hi,

I'm trying this project for the first time, and I am seeig an eror

python dremio_cloner.py ..\test_read.json
Traceback (most recent call last):
File "C:\PythonProjects\dremio-cloner\src\dremio_cloner.py", line 159, in
main()
File "C:\PythonProjects\dremio-cloner\src\dremio_cloner.py", line 47, in main
get_dremio_environment(config)
File "C:\PythonProjects\dremio-cloner\src\dremio_cloner.py", line 78, in get_dremio_environment
dremio_data = reader.read_dremio_environment()
File "C:\PythonProjects\dremio-cloner\src\DremioReader.py", line 57, in read_dremio_environment
self._read_reflections()
File "C:\PythonProjects\dremio-cloner\src\DremioReader.py", line 281, in _read_reflections
reflections = self._dremio_env.list_reflections()['data']
TypeError: 'NoneType' object is not subscriptable

If I change the reflections setting in the JSON config from Process to Skip, this doesn't have an issue.

I'm running against Dremio 18.1.0 Community Edition

Deploying VDS that have Wiki details

Hello,

I am using the dremio-cloner script to deploy my dremio environment and I have some virtual datasets that have Wiki details written. I encounter the following errors during the deployment:

DEBUG:2024-02-06 14:13:41,939:_write_wiki: processing wiki: {'entity_id': 'xxx', 'path': ['SELF_SERVICE_PROJECTS', 'DCOG_PROJECT', 'SIMILARWEB', 'SEGMENT_TRAFFIC_AND_ENGAGEMENT'], 'text': ''} DEBUG:2024-02-06 14:13:41,959:https://xxx.com:443 "GET /api/v3/catalog/by-path/SELF_SERVICE_PROJECTS/DCOG_PROJECT/SIMILARWEB/SEGMENT_TRAFFIC_AND_ENGAGEMENT HTTP/1.1" 404 148 INFO:2024-02-06 14:13:41,959:get_catalog_entity_by_path: received HTTP Response Code 404 for : <api/v3/catalog/by-path/SELF_SERVICE_PROJECTS/DCOG_PROJECT/SIMILARWEB/SEGMENT_TRAFFIC_AND_ENGAGEMENT> errorMessage: Could not find entity with path [[SELF_SERVICE_PROJECTS, DCOG_PROJECT, SIMILARWEB, SEGMENT_TRAFFIC_AND_ENGAGEMENT]] moreInfo: ERROR:2024-02-06 14:13:41,959:_write_wiki: Unable to resolve wiki's dataset for {'entity_id': 'xxx', 'path': ['SELF_SERVICE_PROJECTS', 'DCOG_PROJECT', 'SIMILARWEB', 'SEGMENT_TRAFFIC_AND_ENGAGEMENT'], 'text': ''} ERROR:2024-02-06 14:13:41,959:_write_wiki: Unable to resolve wiki's dataset for {'entity_id': 'xxx', 'path': ['SELF_SERVICE_PROJECTS', 'DCOG_PROJECT', 'SIMILARWEB', 'SEGMENT_TRAFFIC_AND_ENGAGEMENT'], 'text': ''}

Dependency resolving causes an infinite loop on a valid VDS definition

In one of our systems we have the following VDS that causes an infinite loop in dependency resolving.

The VDS' name is Staging.TOS.Container.Container
and the query roughly looks like this:

WITH CONTAINER AS ( SELECT ... )
SELECT *
FROM CONTAINER
WHERE X = 1 

This gave a python recursion depth exception on processing the VDS.
Changing the reference to the following solved the issue:

WITH CONTAINER_BASE AS ( SELECT ... )
SELECT *
FROM CONTAINER_BASE
WHERE X = 1 

The initial query is a perfectly valid query so should IMO not cause an issue in syncing the script to source control

Unable to deploy to more than one level of folders

In our dremio space we have the workspace + the root folder + additional folders..
Example: BI_PROJECTS.XXX1.XXX2.XXX3
We can't deploy to the level of XXX3 folder but only at XXX1 level.
This is an issue as there multiple developments happening at XXX1 level folders and this is causing conflicts as there still non-existing dependencies.
Is there currently a solution for that?
thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.